# Activate conda environment

In [1]:
conda activate qiime2

(qiime2) 

: 1

In [2]:
# Check qiime version and follow tutorial of that version
qiime --version

q2cli version 2021.4.004l
Run `qiime info` for more version details.
[?2004h(qiime2) 

: 1

# Obtaining and importing data

In [3]:
# Check our metadata file
qiime metadata tabulate \
  --m-input-file metadata_16s.tsv \
  --o-visualization qzv/metadata.qzv

(qiime2) [32mSaved Visualization to: qzv/metadata.qzv[0m
(qiime2) 

: 1

### Import Data

In [6]:
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest_v.2.tsv \
  --input-format PairedEndFastqManifestPhred33V2 \
  --output-path qza/paired-end-demux.qza

[32mImported manifest_v.2.tsv as PairedEndFastqManifestPhred33V2 to qza/paired-end-demux.qza[0m
(qiime2) 

: 1

In [4]:
qiime demux summarize --help

Usage: [34mqiime demux summarize[0m [OPTIONS]

  Summarize counts per sample for all samples, and generate interactive
  positional quality plots based on `n` randomly selected sequences.

[1mInputs[0m:
  [34m[4m--i-data[0m ARTIFACT [32mSampleData[SequencesWithQuality |[0m
    [32mPairedEndSequencesWithQuality | JoinedSequencesWithQuality][0m
                       The demultiplexed sequences to be summarized.
                                                                    [35m[required][0m
[1mParameters[0m:
  [34m--p-n[0m INTEGER        The number of sequences that should be selected at
                       random for quality score plots. The quality plots will
                       present the average positional qualities across all of
                       the sequences selected. If input sequences are paired
                       end, plots will be generated for both forward and
                       reverse reads for the same `n` sequences.
             

: 1

### Summary of the demultiplexed results

In [9]:
qiime demux summarize \
  --i-data qza/paired-end-demux.qza \
  --p-n 100000 \
  --o-visualization qzv/demux.qzv

[32mSaved Visualization to: qzv/demux.qzv[0m
(qiime2) 

: 1

In [7]:
# view the result
# run this code in terminal
qiime tools view qzv/demux.qzv


(qiime2) (qiime2) Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.Opening in existing browser session.
(qiime2) 

# Sequence quality control and feature table construction

### Option-1: DADA2 (preffered)

In [10]:
qiime dada2 denoise-paired --help

Usage: [34mqiime dada2 denoise-paired[0m [OPTIONS]

  This method denoises paired-end sequences, dereplicates them, and filters
  chimeras.

[1mInputs[0m:
  [34m[4m--i-demultiplexed-seqs[0m ARTIFACT [32mSampleData[PairedEndSequencesWithQuality][0m
                         The paired-end demultiplexed sequences to be
                         denoised.                                  [35m[required][0m
[1mParameters[0m:
  [34m[4m--p-trunc-len-f[0m INTEGER
                         Position at which forward read sequences should be
                         truncated due to decrease in quality. This truncates
                         the 3' end of the of the input sequences, which will
                         be the bases that were sequenced in the last cycles.
                         Reads that are shorter than this value will be
                         discarded. After this parameter is applied there must
                         still be at least a 12 nucleotide overla

: 1

In [11]:
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs qza/paired-end-demux.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --p-n-threads 0 \
  --verbose \
  --o-representative-sequences qza/rep-seqs.qza \
  --o-table qza/table.qza \
  --o-denoising-stats qza/denoise-stats.qza

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpn4d3zswn/forward /tmp/tmpn4d3zswn/reverse /tmp/tmpn4d3zswn/output.tsv.biom /tmp/tmpn4d3zswn/track.tsv /tmp/tmpn4d3zswn/filt_f /tmp/tmpn4d3zswn/filt_r 0 0 0 0 2.0 2.0 2 12 independent consensus 1.0 0 1000000

R version 4.0.3 (2020-10-10) 
Loading required package: Rcpp
DADA2: 1.18.0 / Rcpp: 1.0.6 / RcppParallel: 5.1.2 
1) Filtering ..........................................
2) Learning Error Rates
230749795 total bases in 1049068 reads from 11 samples will be used for learning the error rates.
230747125 total bases in 1049068 reads from 11 samples will be used for learning the error rates.
3) Denoise samples ..........................................
..........................................
4) Remove chimeras (method = c

: 1

### View summary of statistics produced by dada2

In [2]:
qiime metadata tabulate \
    --m-input-file qza/denoise-stats.qza \
    --o-visualization qzv/denoise-stats.qzv

[32mSaved Visualization to: qzv/denoise-stats.qzv[0m
(qiime2) 

: 1

## FeatureTable and FeatureData summaries
***

In [12]:
qiime feature-table summarize \
  --i-table qza/table.qza \
  --m-sample-metadata-file metadata_16s.tsv \
  --o-visualization qzv/table.qzv

qiime feature-table tabulate-seqs \
  --i-data qza/rep-seqs.qza \
  --o-visualization qzv/rep-seqs.qzv

[32mSaved Visualization to: qzv/table.qzv[0m
(qiime2) [32mSaved Visualization to: qzv/rep-seqs.qzv[0m
(qiime2) 

: 1

# Taxonomic Analysis
***

## Prepare Classifier for V3-V4 (341f-806r primer) region

I followed the [Training Feature Classifier](https://docs.qiime2.org/2021.4/tutorials/feature-classifier/) tutorial from qiime2 docs.
Here I used 2021.4 version of Qiime2.

### Data Collection

In [1]:
mkdir classifier

[?2004h

: 1

We will download greengenes most recetn 13_8 reference datasets to train our classifier.

In [None]:
wget -P /home/arriyaz/data/16S_metagenomics/classifiers/ \
    "ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz"

In [1]:
wget -P /home/arriyaz/data/16S_metagenomics/classifiers/ \
    "ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_5_otus.tar.gz"

--2021-10-18 16:07:53--  ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_5_otus.tar.gz
           => 'classifier/gg_13_5_otus.tar.gz'
Resolving greengenes.microbio.me (greengenes.microbio.me)... 169.228.46.98
Connecting to greengenes.microbio.me (greengenes.microbio.me)|169.228.46.98|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /greengenes_release/gg_13_5 ... done.
==> SIZE gg_13_5_otus.tar.gz ... 318327264
==> PASV ... done.    ==> RETR gg_13_5_otus.tar.gz ... done.
Length: 318327264 (304M) (unauthoritative)


2021-10-18 16:25:43 (299 KB/s) - 'classifier/gg_13_5_otus.tar.gz' saved [318327264]



### Import Data as Qiime2 artifacts

Next we import these data into QIIME 2 Artifacts. Since the Greengenes reference taxonomy file (99_otu_taxonomy.txt) is a tab-separated (TSV) file without a header, we must specify HeaderlessTSVTaxonomyFormat as the source format since the default source format requires a header.

In [3]:
# import 99% OTUs
qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_8_otus/rep_set/99_otus.fasta \
  --output-path classifier/99_otus.qza

[32mImported classifier/gg_13_8_otus/rep_set/99_otus.fasta as DNASequencesDirectoryFormat to classifier/99_otus.qza[0m
[?2004h(qiime2) 

: 1

In [4]:
# import 97% OTUs from 13_5
qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_5_otus/rep_set/97_otus.fasta \
  --output-path classifier/13_5_97_otus.qza

(qiime2) [32mImported /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_5_otus/rep_set/97_otus.fasta as DNASequencesDirectoryFormat to classifier/13_5_97_otus.qza[0m
(qiime2) 

: 1

In [4]:
# import taxonomoy file
qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_8_otus/taxonomy/99_otu_taxonomy.txt  \
  --output-path classifier/ref-taxonomy.qza

[32mImported classifier/gg_13_8_otus/taxonomy/99_otu_taxonomy.txt as HeaderlessTSVTaxonomyFormat to classifier/ref-taxonomy.qza[0m
[?2004h(qiime2) 

: 1

In [5]:
# import taxonomoy file
qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_5_otus/taxonomy/97_otu_taxonomy.txt \
  --output-path classifier/13_5_97_ref-taxonomy.qza

(qiime2) [32mImported /home/arriyaz/data/16S_metagenomics/classifiers/gg_13_5_otus/taxonomy/97_otu_taxonomy.txt as HeaderlessTSVTaxonomyFormat to classifier/13_5_97_ref-taxonomy.qza[0m
(qiime2) 

: 1

### Extract Reference Reads

It has been shown that taxonomic classification accuracy of 16S rRNA gene sequences improves when a Naive Bayes classifier is trained on only the region of the target sequences that was sequenced ([Werner et al., 2012](https://pubmed.ncbi.nlm.nih.gov/21716311/)). This may not necessarily generalize to other marker genes (see note on fungal ITS classification below).



* We know that sequence reads of our study are **220-base pair-end reads** that were amplified with the **341F/806R** primer pair for 16S rRNA gene sequences.

* We optimize for that here by extracting reads from the reference database based on matches to this primer pair, and then kept minimum 100 bp and max 500 bp as our aplicon length is about **466 bp**

In [5]:
qiime feature-classifier extract-reads \
  --i-sequences classifier/13_5_97_otus.qza \
  --p-f-primer CCTAYGGGRBGCASCAG \
  --p-r-primer GGACTACNNGGGTATCTAAT \
  --p-min-length 100 \
  --p-max-length 500 \
  --p-n-jobs 5 \
  --o-reads classifier/ref-seqs.qza

[32mSaved FeatureData[Sequence] to: classifier/ref-seqs.qza[0m
[?2004h(qiime2) 

: 1

In [6]:
# Extract reads from gg_13_5 with 97% otus
qiime feature-classifier extract-reads \
  --i-sequences classifier/99_otus.qza \
  --p-f-primer CCTAYGGGRBGCASCAG \
  --p-r-primer GGACTACNNGGGTATCTAAT \
  --p-min-length 100 \
  --p-max-length 500 \
  --p-n-jobs 5 \
  --o-reads classifier/13_5_97-ref-seqs.qza

(qiime2) [32mSaved FeatureData[Sequence] to: classifier/13_5_97-ref-seqs.qza[0m
(qiime2) 

: 1

### Train the classifier

We can now train a Naive Bayes classifier as follows, using the reference reads and taxonomy that we just created.

In [7]:
qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads classifier/ref-seqs.qza \
  --i-reference-taxonomy classifier/ref-taxonomy.qza \
  --o-classifier classifier/gg-13_8-99otu-341f-806r-nb-classifier.qza \
  --verbose

[32mSaved TaxonomicClassifier to: classifier/gg-13_8-99otu-341f-806r-nb-classifier.qza[0m
[?2004h(qiime2) 

: 1

In [7]:
# A classifier with 13_5 at 97% OTUs
qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads classifier/13_5_97-ref-seqs.qza \
  --i-reference-taxonomy classifier/13_5_97_ref-taxonomy.qza \
  --o-classifier classifier/gg-13_5-97otu-341f-806r-nb-classifier.qza \
  --verbose

[32mSaved TaxonomicClassifier to: classifier/gg-13_5-97otu-341f-806r-nb-classifier.qza[0m
(qiime2) 

: 1

### Run the classifier

In [4]:
qiime feature-classifier --help

Usage: [34mqiime feature-classifier[0m [OPTIONS] COMMAND [ARGS]...

  Description: This QIIME 2 plugin supports taxonomic classification of
  features using a variety of methods, including Naive Bayes, vsearch, and
  BLAST+.

  Plugin website: https://github.com/qiime2/q2-feature-classifier

  Getting user support: Please post to the QIIME 2 forum for help with this
  plugin: https://forum.qiime2.org

[1mOptions[0m:
  [34m--version[0m    Show the version and exit.
  [34m--citations[0m  Show citations and exit.
  [34m--help[0m       Show this message and exit.

[1mCommands[0m:
  [34mclassify-consensus-blast[0m        BLAST+ consensus taxonomy classifier
  [34mclassify-consensus-vsearch[0m      VSEARCH-based consensus taxonomy classifier
  [34mclassify-hybrid-vsearch-sklearn[0m
                                  ALPHA Hybrid classifier: VSEARCH exact match
                                  + sklearn classifier

  [34mclassify-sklearn[0m                Pre-fitted sklear

: 1

In [9]:
qiime feature-classifier classify-sklearn \
  --i-classifier classifier/gg-13_8-99otu-341f-806r-nb-classifier.qza \
  --i-reads qza/rep-seqs.qza \
  --o-classification qza/taxonomy.qza \
  --p-n-jobs 4 \
  --verbose

[32mSaved FeatureData[Taxonomy] to: qza/taxonomy.qza[0m
[?2004h(qiime2) 

: 1

In [10]:
# create taxonomy visualization
qiime metadata tabulate \
  --m-input-file qza/taxonomy.qza \
  --o-visualization qzv/taxonomy.qzv

[32mSaved Visualization to: qzv/taxonomy.qzv[0m
[?2004h(qiime2) 

: 1

## Generate barplot for taxonomic composition

----

**Create bar plot with all data**

In [8]:
qiime taxa --citations

[33mNo citations found.[0m
(qiime2) 

: 1

In [16]:
# Create taxonomy barplot
qiime taxa barplot \
  --i-table qza/table.qza \
  --i-taxonomy qza/taxonomy.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --o-visualization qzv/taxa_barplot.qzv

(qiime2) [32mSaved Visualization to: qzv/taxa_barplot.qzv[0m
(qiime2) 

: 1

**Export the barplot in html format, so that I can use it as supplementary material**

In [17]:
qiime tools export \
    --input-path qzv/taxa_barplot.qzv \
    --output-path results/result_files/taxa-bar-plot

[32mExported qzv/taxa_barplot.qzv as Visualization to directory results/result_files/taxa-bar-plot[0m
(qiime2) 

: 1

## Generate Krona Plot

**First we have to export the `taxonomy.qza` and `table.qza` file suitable for krona plot analysis**

### Export taxonomy(classification) file and table file

In [18]:
mkdir results/result_files/Krona_plot

(qiime2) 

: 1

In [19]:
## Export table.qza file into biom file
qiime tools export \
    --input-path qza/table.qza \
    --output-path results/result_files/Krona_plot

# Rename the file to example_feature_table
#mv exported/feature-table.biom exported/example_feature_table.biom

(qiime2) [32mExported qza/table.qza as BIOMV210DirFmt to directory results/result_files/Krona_plot[0m
(qiime2) (qiime2) (qiime2) (qiime2) 

: 1

In [20]:
# export taxonomy artifact into taxonomy.tsv
qiime tools export \
    --input-path qza/taxonomy.qza \
    --output-path results/result_files/Krona_plot

# Rename the file to example_taxonomy
# mv exported/taxonomy.tsv exported/example_taxonomy.tsv

(qiime2) [32mExported qza/taxonomy.qza as TSVTaxonomyDirectoryFormat to directory results/result_files/Krona_plot[0m
(qiime2) (qiime2) (qiime2) (qiime2) 

: 1

### Convert the `biom` table

**This code below will convert the `biom` table to `classic` tsv file (normal ASV abundance table)**

To know details about `convert` command you can run `biom convert -h`

In [21]:
biom convert \
    -i results/result_files/Krona_plot/feature-table.biom \
    -o results/result_files/Krona_plot/feature-table.tsv \
    --to-tsv

(qiime2) 

: 1

### Prepare suitable files to feed into `Krona` tool

**Now we will use [krona_qiime.py](https://github.com/lokeshbio/Amplicon_course/blob/293f8773aa46da950fab79e40cb3f2b4f71658d2/krona_qiime.py) script by [lokeshbio](https://github.com/lokeshbio).**

**This script will combine the the taxonomic classification of the ASV to their abundance.**

**<font color=red>NB: This script generates a list of files in the current directory. So, We will move these file to `krona_files` directory. </font>**

In [22]:
# Run the script
python scripts/krona_qiime.py  \
    results/result_files/Krona_plot/taxonomy.tsv \
    results/result_files/Krona_plot/feature-table.tsv

(qiime2) (qiime2) 

: 1

In [23]:
# Recursively remove krona_files directory, if it already exists
rm -rf results/result_files/Krona_plot/text_files

# Again make kron_files directory
mkdir results/result_files/Krona_plot/text_files


(qiime2) (qiime2) (qiime2) (qiime2) (qiime2) 

: 1

In [24]:
# Move all files to this directory that have been generated by krona_qiime.py script
mv B*.krona.txt results/result_files/Krona_plot/text_files

(qiime2) (qiime2) 

: 1

### Run the krona tool

We can install krona tool by `conda install -c bioconda krona` code. Then we will be able to use `ktImportText` command.

In [25]:
ktImportText


                                           _________________________________
__________________________________________/ KronaTools 2.7.1 - ktImportText \___

Creates a Krona chart from text files listing quantities and lineages.
                                                                     _______
____________________________________________________________________/ Usage \___

ktImportText \
   [options] \
   text_1[,name_1] \
   [text_2[,name_2]] \
   ...

   text  Tab-delimited text file. Each line should be a number followed by a
         list of wedges to contribute to (starting from the highest level). If
         no wedges are listed (and just a quantity is given), it will
         contribute to the top level. If the same lineage is listed more than
         once, the values will be added. Quantities can be omitted if -q is
         specified. Lines beginning with "#" will be ignored. By default,
         separate datasets will be created for each input (see [-c]).

   n

: 1

#### Create the Krona plot

In [25]:
ktImportText \
    -c results/result_files/Krona_plot/text_files/* \
    -n BD_Indigenous_Cohort \
    -o results/result_files/Krona_taxa.html

Writing results/result_files/Krona_taxa.html...
(qiime2) 

: 1

# Export Feature Table

In [2]:
# Export feature table at species level(5)
qiime taxa collapse \
  --i-table qza/table.qza \
  --i-taxonomy qza/taxonomy.qza \
  --p-level 7\
  --o-collapsed-table qza/feature-table-with-taxa-name-l7.qza

(qiime2) [32mSaved FeatureTable[Frequency] to: qza/feature-table-with-taxa-name-l7.qza[0m
(qiime2) 

: 1

In [4]:
# Export as biom file
qiime tools export \
    --input-path qza/feature-table-with-taxa-name-l7.qza \
    --output-path qza/feature-table-with-taxa-name-l7


# Convert the feature table artifact into hdf5 biom table
biom convert \
    -i qza/feature-table-with-taxa-name-l7/feature-table.biom \
    -o qzv/feature-table-with-taxa-name-l7.tsv \
    --to-tsv

(qiime2) [32mExported qza/feature-table-with-taxa-name-l7.qza as BIOMV210DirFmt to directory qza/feature-table-with-taxa-name-l7[0m
(qiime2) (qiime2) (qiime2) (qiime2) (qiime2) 

: 1

# Heatmap

In [None]:
qiime feature-table heatmap --help

Usage: [34mqiime feature-table heatmap[0m [OPTIONS]

  Generate a heatmap representation of a feature table with optional
  clustering on both the sample and feature axes.

  Tip: To generate a heatmap containing taxonomic annotations, use `qiime
  taxa collapse` to collapse the feature table at the desired taxonomic
  level.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                         The feature table to visualize.            [35m[required][0m
[1mParameters[0m:
  [34m--m-sample-metadata-file[0m METADATA
  [34m--m-sample-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                         Annotate the sample IDs with these sample metadata
                         values. When metadata is present and
                         `cluster`='feature', samples will be sorted by the
                         metadata values.                           [35m[optional][0m
  [34m--m-feature-metadata-file[0m METADATA
  

: 1

## Data preparation

In [None]:
# collapse feature table at family level(5)
qiime taxa collapse \
  --i-table qza/table.qza \
  --i-taxonomy qza/taxonomy.qza \
  --p-level 5\
  --o-collapsed-table qza/table-l5.qza

(qiime2) [32mSaved FeatureTable[Frequency] to: qza/table-l5.za.qza[0m
(qiime2) 

: 1

This `table-l5.qza` contain full name of a feature. I mean kingdom, phylum, class, order....

This long name will make our heatmap unread able. So, from `table-l5.qza` we will keep only **family** name.

I am going to do this almost manually. The workflow will be as following steps:
1. Export the `table-l5.qza` to biom table (hdf5 format).
2. Convert the `feature-table.biom` file into tsv (classical biom table).
3. Open the `tsv` file in **libreoffice calc** and remove the extra part of the feature name.
4. Again convert the `tsv` file into hdf5 biom file.
5. Import the `biom` file as **qiime artifact**.

**Step 1 and 2**

In [31]:
# Export as biom file
qiime tools export \
    --input-path qza/table-l5.qza \
    --output-path qza/table-l5-biom


# Convert the feature table artifact into hdf5 biom table
biom convert \
    -i qza/table-l5-biom/feature-table.biom \
    -o qza/table-l5-biom/feature-table.tsv \
    --to-tsv

(qiime2) [32mExported qza/table-l5.qza as BIOMV210DirFmt to directory qza/table-l5-biom[0m
(qiime2) (qiime2) (qiime2) (qiime2) (qiime2) 

: 1

**Step 3**

Now open the `tsv` file in **libreoffice calc**, remove the extra part of feature name, and save it as another `tsv` file.

**Step 4**

Again convert the tsv biom file into hdf5 biom file.

In [38]:
biom convert \
    -i qza/table-l5-biom/feature-table-family-only.csv \
    -o qza/table-l5-biom/feature-table-family-only.biom \
    --to-hdf5

(qiime2) 

: 1

**Step 5**

Import the biom file as qiime artifact

In [39]:
qiime tools import \
    --type FeatureTable[Frequency] \
    --input-path qza/table-l5-biom/feature-table-family-only.biom \
    --input-format BIOMV210Format  \
    --output-path qza/feature-table-family-only.qza

[32mImported qza/table-l5-biom/feature-table-family-only.biom as BIOMV210Format to qza/feature-table-fammily-only.qza[0m
(qiime2) 

: 1

**Let's get a summary of the newly crated feature table so that we can filter very low abundant features**

In [41]:
# Summerize feature table
qiime feature-table summarize \
  --i-table qza/feature-table-family-only.qza \
  --m-sample-metadata-file metadata_v2.tsv \
  --o-visualization qzv/feature-table-family-only.qzv

(qiime2) [32mSaved Visualization to: qzv/feature-table-family-only.qzv[0m
(qiime2) 

: 1

From our `feature-table-family-only.qzv` visualization we can see that there are only 121 type of features at family level. Among them some features' freqency is as low as 1.

I will remove the very low abundance features, so that we can focus on the features, that really matters. **Here I will list only those features which are in at least 2 samples.**

This is no rule. I am just using a threshold. There is no specific logic behind it.

In [48]:
# Filtering the very low abundance features
qiime feature-table filter-features \
  --i-table qza/feature-table-family-only.qza\
  --p-min-frequency 50 \
  --o-filtered-table qza/feature-table-family-only-filered.qza

(qiime2) [32mSaved FeatureTable[Frequency] to: qza/feature-table-family-only-filered.qza[0m
(qiime2) 

: 1

In [50]:
# Summerize feature table
qiime feature-table summarize \
  --i-table qza/feature-table-family-only-filered.qza \
  --m-sample-metadata-file metadata_v2.tsv \
  --o-visualization qzv/feature-table-family-only-filered.qzv

(qiime2) [32mSaved Visualization to: qzv/feature-table-family-only-filered.qzv[0m
(qiime2) 

: 1

## Create Hetamap

In [40]:
qiime feature-table heatmap --help

Usage: [34mqiime feature-table heatmap[0m [OPTIONS]

  Generate a heatmap representation of a feature table with optional
  clustering on both the sample and feature axes.

  Tip: To generate a heatmap containing taxonomic annotations, use `qiime
  taxa collapse` to collapse the feature table at the desired taxonomic
  level.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                         The feature table to visualize.            [35m[required][0m
[1mParameters[0m:
  [34m--m-sample-metadata-file[0m METADATA
  [34m--m-sample-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                         Annotate the sample IDs with these sample metadata
                         values. When metadata is present and
                         `cluster`='feature', samples will be sorted by the
                         metadata values.                           [35m[optional][0m
  [34m--m-feature-metadata-file[0m METADATA
  

: 1

### Feature Table Heatmap based on Cohort

In [51]:
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Cohort \
    --p-cluster features \
    --o-visualization qzv/feature-table-heatmap-Cohort.qzv

[32mSaved Visualization to: qzv/feature-table-heatmap-Cohort.qzv[0m
(qiime2) 

: 1

In [12]:
# Export heatmap
qiime tools export \
    --input-path qzv/feature-table-heatmap-Cohort.qzv \
    --output-path qzv/heatmap-cohort

(qiime2) [32mExported qzv/feature-table-heatmap-Cohort.qzv as Visualization to directory qzv/heatmap-cohort[0m
(qiime2) 

: 1

### Feature Table Heatmap based on Chakma

In [53]:
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Chakma \
    --p-cluster features \
    --p-color-scheme viridis \
    --o-visualization qzv/feature-table-heatmap-Chakma.qzv

[32mSaved Visualization to: qzv/feature-table-heatmap-Chakma.qzv[0m
(qiime2) 

: 1

### Feature Table Heatmap based on Sex

In [54]:
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Sex \
    --p-cluster features \
    --p-color-scheme plasma \
    --o-visualization qzv/feature-table-heatmap-Sex.qzv

[32mSaved Visualization to: qzv/feature-table-heatmap-Sex.qzv[0m
(qiime2) 

: 1

### Feature Table Heatmap based on Age Group

In [55]:
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Age_group \
    --p-cluster features \
    --p-color-scheme magma \
    --o-visualization qzv/feature-table-heatmap-Age.qzv

[32mSaved Visualization to: qzv/feature-table-heatmap-Age.qzv[0m
(qiime2) 

: 1

### Feature Table Heatmap based on Diseases groups (Control, HTN, GAstric_issues, DM, Arthritis)

In [None]:
# Based on Control column
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Control \
    --p-cluster features \
    --p-color-scheme cividis \
    --o-visualization qzv/feature-table-heatmap-Control.qzv
    
# Based on HTN column
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column HTN \
    --p-cluster features \
    --p-color-scheme PiYG_r \
    --o-visualization qzv/feature-table-heatmap-HTN.qzv

# Based on Gastric_issues column
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Gastric_issues \
    --p-cluster features \
    --p-color-scheme cool \
    --o-visualization qzv/feature-table-heatmap-Gastric.qzv


# Based on DM column
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column DM \
    --p-cluster features \
    --p-color-scheme BrBG \
    --o-visualization qzv/feature-table-heatmap-DM.qzv


# Based on Arthritis column
qiime feature-table heatmap \
    --i-table qza/feature-table-family-only-filered.qza \
    --m-sample-metadata-file metadata_v2.tsv \
    --m-sample-metadata-column Arthritis \
    --p-cluster features \
    --p-color-scheme Spectral_r \
    --o-visualization qzv/feature-table-heatmap-Arthritis.qzv

# Core Features

Now we will identify core features from **family** level.

In qiime `feature-table` plugin contains a command `core-features` to identify the core-micromes of given samples.

Here I used `feature-table-family-only-filtered.qza` artifact to identify **core-features**, because I manually eidted it's data for **heatmap** generation and removed extra part of the fature names. So, it will be suitable for data presentation.

In [10]:
qiime feature-table core-features \
    --i-table qza/feature-table-family-only-filered.qza \
    --o-visualization qzv/core-features-family-filtered.qzv

[32mSaved Visualization to: qzv/core-features-family-filtered.qzv[0m
(qiime2) 

: 1

From `core-features-family-filtered.qzv` we can see that **full fraction (1.00)** of the samples (**that means all 42 samples**) contain 19 type of microbes family (feature count).

When we download the feature list in **TSV** format, we can see that the data is in **7 number statistics** format.

<font color = red> Yet I don't know how to plot data from 7 number statistics format. </font>

<font color = green> So I did another trick. I filtered the features which are in all **42 samples**, which generates same results as **core-features for full fraction of samples**. 

Then,
- Export the artifact in hdf5 biom format
- Convert it into tsv format

Thus, I get the feature counts of core microbe families for each samples.

<font color = blue > Now, I will load the tsv data from **R_codes** notebook to generate box_plot from this tsv file. </font>

In [11]:
# Filtering the core features
qiime feature-table filter-features \
  --i-table qza/feature-table-family-only.qza\
  --p-min-samples 42 \
  --o-filtered-table qza/core-feature-family-level.qza

(qiime2) [32mSaved FeatureTable[Frequency] to: qza/core-feature-family-level.qza[0m
(qiime2) 

: 1

In [12]:
qiime tools export \
    --input-path qza/core-feature-family-level.qza \
    --output-path qza/core-features

[32mExported qza/core-feature-family-level.qza as BIOMV210DirFmt to directory qza/core-features[0m
(qiime2) 

: 1

In [14]:
biom convert \
    -i qza/core-features/feature-table.biom \
    -o qza/core-features/core-feature-family-level.tsv \
    --to-tsv

(qiime2) 

: 1

# Generate a tree for phylogenetic diversity analyses
***

### Perform phylogenetic diversity by fragment insertion

In [2]:
# Download the reference database
wget \
  -O "sepp-refs-gg-13-8.qza" \
  "https://data.qiime2.org/2021.4/common/sepp-refs-gg-13-8.qza"

(qiime2) --2021-08-12 11:44:44--  https://data.qiime2.org/2021.4/common/sepp-refs-gg-13-8.qza
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2021.4/common/sepp-refs-gg-13-8.qza [following]
--2021-08-12 11:44:45--  https://s3-us-west-2.amazonaws.com/qiime2-data/2021.4/common/sepp-refs-gg-13-8.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.92.145.112
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.92.145.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50161069 (48M) [binary/octet-stream]
Saving to: 'sepp-refs-gg-13-8.qza'


2021-08-12 11:48:05 (247 KB/s) - 'sepp-refs-gg-13-8.qza' saved [50161069/50161069]

(qiime2) 

: 1

In [3]:
qiime fragment-insertion sepp --help

Usage: [34mqiime fragment-insertion sepp[0m [OPTIONS]

  Perform fragment insertion of sequences using the SEPP algorithm.

[1mInputs[0m:
  [34m[4m--i-representative-sequences[0m ARTIFACT [32mFeatureData[Sequence][0m
                       The sequences to insert into the reference tree.
                                                                    [35m[required][0m
  [34m[4m--i-reference-database[0m ARTIFACT [32mSeppReferenceDatabase[0m
                       The reference database to insert the representative
                       sequences into.                              [35m[required][0m
[1mParameters[0m:
  [34m--p-alignment-subset-size[0m INTEGER
                       Each placement subset is further broken into subsets
                       of at most these many sequences and a separate HMM is
                       trained on each subset.                 [35m[default: 1000][0m
  [34m--p-placement-subset-size[0m INTEGER
                      

: 1

In [4]:
qiime fragment-insertion sepp \
  --i-representative-sequences qza/rep-seqs.qza \
  --i-reference-database qza/sepp-refs-gg-13-8.qza \
  --o-tree qza/sepp_tree.qza \
  --o-placements qza/sepp_tree_placements.qza \
  --p-threads 60  # update to a higher number if you can
  --verbose

[32mSaved Phylogeny[Rooted] to: qza/sepp_tree.qza[0m
[32mSaved Placements to: qza/_sepptree_placements.qza[0m
(qiime2) 

: 1

In [6]:
# Export tree
qiime tools export \
  --input-path qza/sepp_tree.qza \
  --output-path sepp_exported-tree

(qiime2) [32mExported qza/sepp_tree.qza as NewickDirectoryFormat to directory sepp_exported-tree[0m
(qiime2) 

: 1

### Perform phylogenetic diversity in one step (extra)

The code below will perform all phylogenetic tree related steps in one run.

**But if we want to do these analysis step by step then, we can follow the codes under *Perform phylogenetic diversity step by step* section**

### Perform phylogenetic diversity step by step (extra)

In [4]:
# perform multiple sequence alignment
qiime alignment mafft \
  --i-sequences qza/rep-seqs.qza \
  --o-alignment qza/aligned-rep-seqs.qza

(qiime2) [32mSaved FeatureData[AlignedSequence] to: qza/aligned-rep-seqs.qza[0m
(qiime2) 

: 1

Next, we mask (or filter) the alignment to remove positions that are highly variable. These positions are generally considered to add noise to a resulting phylogenetic tree.

In [54]:
qiime alignment mask \
  --i-alignment qza/aligned-rep-seqs.qza \
  --o-masked-alignment qza/masked-aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: qza/masked-aligned-rep-seqs.qza[0m
(qiime2) 

: 1

Next, we’ll apply FastTree to generate a phylogenetic tree from the masked alignment.

## Alpha Rarefaction Curve

In [8]:
qiime diversity --citations

[33mNo citations found.[0m
(qiime2) 

: 1

In [4]:
qiime tools export \
    --input-path qzv/alpha-rarefaction.qzv \
    --output-path qzv/alpha-rarefaction

[32mExported qzv/alpha-rarefaction.qzv as Visualization to directory qzv/alpha-rarefaction[0m
(qiime2) 

: 1

##  Core Metrics

In [7]:
qiime diversity core-metrics-phylogenetic --help

Usage: [34mqiime diversity core-metrics-phylogenetic[0m [OPTIONS]

  Applies a collection of diversity metrics (both phylogenetic and non-
  phylogenetic) to a feature table.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                          The feature table containing the samples over which
                          diversity metrics should be computed.     [35m[required][0m
  [34m[4m--i-phylogeny[0m ARTIFACT  Phylogenetic tree containing tip identifiers that
    [32mPhylogeny[Rooted][0m     correspond to the feature identifiers in the table.
                          This tree can contain tip ids that are not present
                          in the table, but all feature ids in the table must
                          be present in this tree.                  [35m[required][0m
[1mParameters[0m:
  [34m[4m--p-sampling-depth[0m INTEGER
    [32mRange(1, None)[0m        The total frequency that each sample should be
          

: 1

In [6]:
qiime diversity core-metrics-phylogenetic \
  --i-table qza/table.qza \
  --i-phylogeny qza/sepp_tree.qza \
  --m-metadata-file  metadata_16s.tsv \
  --p-sampling-depth 34568 \
  --p-n-jobs-or-threads 'auto' \
  --output-dir ./core-metrics-results

[32mSaved FeatureTable[Frequency] to: ./core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ./core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ./core-metrics-results/observed_features_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ./core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: ./core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: ./core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ./core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ./core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: ./core-metrics-results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: ./core-metrics-results/unweighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: ./core-metrics-results/weighted_unifrac_pcoa_results.qza[0m
[32mSave

: 1

In [8]:
qiime feature-table summarize \
  --i-table core-metrics-results/rarefied_table.qza \
  --m-sample-metadata-file metadata_16s.tsv \
  --o-visualization qzv/rarefied_table.qzv

[32mSaved Visualization to: qzv/rarefied_table.qzv[0m
[?2004h(qiime2) 

: 1

# Alpha Diversity

### Observed Features

In [9]:
qiime diversity alpha-group-significance --help

Usage: [34mqiime diversity alpha-group-significance[0m [OPTIONS]

  Visually and statistically compare groups of alpha diversity values.

[1mInputs[0m:
  [34m[4m--i-alpha-diversity[0m ARTIFACT [32mSampleData[AlphaDiversity][0m
                       Vector of alpha diversity values by sample.  [35m[required][0m
[1mParameters[0m:
  [34m[4m--m-metadata-file[0m METADATA...
    (multiple          The sample metadata.
     arguments will    
     be merged)                                                     [35m[required][0m
[1mOutputs[0m:
  [34m[4m--o-visualization[0m VISUALIZATION
                                                                    [35m[required][0m
[1mMiscellaneous[0m:
  [34m--output-dir[0m PATH    Output unspecified results to a directory
  [34m--verbose[0m / [34m--quiet[0m  Display verbose output to stdout and/or stderr during
                       execution of this action. Or silence output if
                       execution is succe

: 1

In [9]:
qiime diversity alpha-group-significance \
  --i-alpha-diversity ./core-metrics-results/observed_features_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --o-visualization ./core-metrics-results/observed_features_statistics.qzv

[32mSaved Visualization to: ./core-metrics-results/observed_features_statistics.qzv[0m
(qiime2) 

: 1

### Shanon Index

In [3]:
qiime diversity alpha-group-significance \
  --i-alpha-diversity ./core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --o-visualization ./core-metrics-results/shannon_vector_statistics.qzv

[32mSaved Visualization to: ./core-metrics-results/shannon_vector_statistics.qzv[0m
(qiime2) 

: 1

### Faith PD

In [4]:
qiime diversity alpha-group-significance \
  --i-alpha-diversity ./core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --o-visualization ./core-metrics-results/faith_pd_statistics.qzv

[32mSaved Visualization to: ./core-metrics-results/faith_pd_statistics.qzv[0m
(qiime2) 

: 1

### Evenness

In [6]:
qiime diversity alpha-group-significance \
  --i-alpha-diversity ./core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --o-visualization ./core-metrics-results/evenness_statistics.qzv

[32mSaved Visualization to: ./core-metrics-results/evenness_statistics.qzv[0m
(qiime2) 

: 1

## ANOVA test

In this case effect of all other metadata category was substracted from Cohort to test whether only Cohort has any effect of alpha diversity.

That's why two way ANOVA test was performed with all alpha diversity metric.

In [2]:
qiime longitudinal anova --help

Usage: [34mqiime longitudinal anova[0m [OPTIONS]

  Perform an ANOVA test on any factors present in a metadata file and/or
  metadata-transformable artifacts. This is followed by pairwise t-tests to
  examine pairwise differences between categorical sample groups.

[1mParameters[0m:
  [34m[4m--m-metadata-file[0m METADATA...
    (multiple          Sample metadata containing formula terms.
     arguments will    
     be merged)                                                     [35m[required][0m
  [34m[4m--p-formula[0m TEXT     R-style formula specifying the model. All terms must
                       be present in the sample metadata or
                       metadata-transformable artifacts and can be continuous
                       or categorical metadata columns. Formulae will be in
                       the format "a ~ b + c", where "a" is the metric
                       (dependent variable) and "b" and "c" are independent
                       covariates. Use "

: 1

#### **Cohort and non-disease categories**

In [78]:
# ANOVA for observed features
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/observed_features_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'observed_features ~ Cohort*(Sex+Age_group)' \
  --o-visualization ./core-metrics-results/of-cohort-age_sex-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/of-cohort-age_sex-anova.qzv[0m
(qiime2) 

: 1

In [69]:
# ANOVA for shannon index
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'shannon_entropy ~ Cohort*(Sex+Age_group)' \
  --o-visualization ./core-metrics-results/shannon-cohort-sex_age-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/shannon-cohort-sex_age-anova.qzv[0m
(qiime2) 

: 1

In [72]:
# ANOVA for evenness
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'pielou_evenness ~ Cohort*(Sex+Age_group)' \
  --o-visualization ./core-metrics-results/evenness-cohort-age_sex-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/evenness-cohort-age_sex-anova.qzv[0m
(qiime2) 

: 1

In [71]:
# ANOVA for faith pd
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/faith_pd_vector.qza\
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'faith_pd ~ Cohort*(Sex+Age_group)' \
  --o-visualization ./core-metrics-results/faith_pd-cohort-sex_age-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/faith_pd-cohort-sex_age-anova.qzv[0m
(qiime2) 

: 1

____

#### **Cohort and Disease categories**

In [79]:
# ANOVA for observed features
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/observed_features_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'observed_features ~ Cohort*(Gastric_issues+HTN+DM+Arthritis)' \
  --o-visualization ./core-metrics-results/cohort-disease-anova/of-cohort_diseases-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/cohort-disease-anova/of-cohort_diseases-anova.qzv[0m
(qiime2) 

: 1

In [75]:
# ANOVA for shannon index
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'shannon_entropy ~ Cohort*(Gastric_issues+HTN+DM+Arthritis)' \
  --o-visualization ./core-metrics-results/shannon-cohort_diseases-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/shannon-cohort_diseases-anova.qzv[0m
(qiime2) 

: 1

In [76]:
# ANOVA for faith pd
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/faith_pd_vector.qza\
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'faith_pd ~ Cohort*(Gastric_issues+HTN+DM+Arthritis)' \
  --o-visualization ./core-metrics-results/faith_pd-cohort_diseases-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/faith_pd-cohort_diseases-anova.qzv[0m
(qiime2) 

: 1

In [77]:
# ANOVA for evenness
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'pielou_evenness ~ Cohort*(Gastric_issues+HTN+DM+Arthritis)' \
  --o-visualization ./core-metrics-results/evenness-cohort_diseases-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/evenness-cohort_diseases-anova.qzv[0m
(qiime2) 

: 1

#### **Chorot minus other categories**

In [44]:
# ANOVA for observed features
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/observed_features_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'observed_features ~ Cohort-(Gastric_issues+Sex+Age_group+HTN+Gastric_issues+DM+Control+Arthritis)' \
  --o-visualization ./core-metrics-results/of-only-cohort-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/of-only-cohort-anova.qzv[0m
(qiime2) 

: 1

In [45]:
# ANOVA for shannon index
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/shannon_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'shannon_entropy ~ Cohort-(Gastric_issues+Sex+Age_group+HTN+Gastric_issues+DM+Control+Arthritis)' \
  --o-visualization ./core-metrics-results/shannon-only-cohort-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/shannon-only-cohort-anova.qzv[0m
(qiime2) 

: 1

In [46]:
# ANOVA for faith pd
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/faith_pd_vector.qza\
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'faith_pd ~ Cohort-(Gastric_issues+Sex+Age_group+HTN+Gastric_issues+DM+Control+Arthritis)' \
  --o-visualization ./core-metrics-results/faith_pd-only-cohort-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/faith_pd-only-cohort-anova.qzv[0m
(qiime2) 

: 1

In [48]:
# ANOVA for evenness
qiime longitudinal anova \
  --m-metadata-file ./core-metrics-results/evenness_vector.qza \
  --m-metadata-file ./metadata_16s.tsv \
  --p-formula 'pielou_evenness ~ Cohort-(Gastric_issues+Sex+Age_group+HTN+Gastric_issues+DM+Control+Arthritis)' \
  --o-visualization ./core-metrics-results/evenness-only-cohort-anova.qzv

(qiime2) [32mSaved Visualization to: ./core-metrics-results/evenness-only-cohort-anova.qzv[0m
(qiime2) 

: 1

# Beta diversity Analysis

In [3]:
qiime diversity beta-group-significance --help

Usage: [34mqiime diversity beta-group-significance[0m [OPTIONS]

  Determine whether groups of samples are significantly different from one
  another using a permutation-based statistical test.

[1mInputs[0m:
  [34m[4m--i-distance-matrix[0m ARTIFACT
    [32mDistanceMatrix[0m     Matrix of distances between pairs of samples.
                                                                    [35m[required][0m
[1mParameters[0m:
  [34m[4m--m-metadata-file[0m METADATA
  [34m[4m--m-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                       Categorical sample metadata column.          [35m[required][0m
  [34m--p-method[0m TEXT [32mChoices('permanova', 'anosim', 'permdisp')[0m
                       The group significance test to be applied.
                                                        [35m[default: 'permanova'][0m
  [34m--p-pairwise[0m / [34m--p-no-pairwise[0m
                       Perform pairwise tests between all pairs

: 1

### Based On Age_group

In [3]:
# Based on Age_group
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Age_group \
  --o-visualization qzv/jaccard-Age_group-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/jaccard-Age_group-significance.qzv[0m
(qiime2) 

: 1

In [4]:
# Based on Age_group
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza\
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Age_group \
  --o-visualization qzv/bray-curtis-Age_group-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/bray-curtis-Age_group-significance.qzv[0m
(qiime2) 

: 1

In [3]:
# Based on Age_group
# Unweighted UniFrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Age_group \
  --o-visualization qzv/unweighted-unifrac-Age_group-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: qzv/unweighted-unifrac-Age_group-significance.qzv[0m
[?2004h(qiime2) 

: 1

In [4]:
# Based on Age_group
# Weighted UniFrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Age_group \
  --o-visualization qzv/weighted-unifrac-Age_group-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: qzv/weighted-unifrac-Age_group-significance.qzv[0m
[?2004h(qiime2) 

: 1

### Based on Cohort

In [3]:
# Based on Cohort
# Jaccard distance
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Cohort \
  --o-visualization qzv/beta-significance/jaccard-Cohort-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-Cohort-significance.qzv[0m
(qiime2) 

: 1

In [10]:
# Based on Cohort
# Bray Curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Cohort \
  --o-visualization qzv/bray-curtis-Cohort-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: qzv/bray-curtis-Cohort-significance.qzv[0ml
[?2004h(qiime2) 

: 1

In [5]:
# Based on Cohort
# Unweighted UniFrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Cohort \
  --o-visualization qzv/unweighted-unifrac-Cohort-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: qzv/unweighted-unifrac-Cohort-significance.qzv[0m
[?2004h(qiime2) 

: 1

In [6]:
# Based on Cohort
# Weighted UniFrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Cohort \
  --o-visualization qzv/weighted-unifrac-Cohort-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: qzv/weighted-unifrac-Cohort-significance.qzv[0m
[?2004h(qiime2) 

: 1

In [None]:
# Based on Cohort without pairwise
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Cohort \
  --o-visualization qzv/unweighted-unifrac-Cohort_unpair-significance.qzv

(qiime2) [32mSaved Visualization to: qzv/unweighted-unifrac-Cohort_unpair-significance.qzv[0m
(qiime2) 

: 1

### Based on HTN

In [10]:
# HTN
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column HTN \
  --o-visualization qzv/beta-significance/jaccard-HTN-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-HTN-significance.qzv[0m
(qiime2) 

: 1

In [11]:
# HTN
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column HTN \
  --o-visualization qzv/beta-significance/bray_curtis-HTN-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-HTN-significance.qzv[0m
(qiime2) 

: 1

In [13]:
# HTN
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column HTN \
  --o-visualization qzv/beta-significance/unweighted_unifrac-HTN-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-HTN-significance.qzv[0m
(qiime2) 

: 1

In [12]:
# HTN
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column HTN \
  --o-visualization qzv/beta-significance/weighted_unifrac-HTN-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-HTN-significance.qzv[0m
(qiime2) 

: 1

### Gastric_issues

In [14]:
# Gastric issues
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Gastric_issues \
  --o-visualization qzv/beta-significance/jaccard-Gastric_issues-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-Gastric_issues-significance.qzv[0m
(qiime2) 

: 1

In [15]:
# Gastric_issues
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Gastric_issues \
  --o-visualization qzv/beta-significance/bray_curtis-Gastric_issues-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-Gastric_issues-significance.qzv[0m
(qiime2) 

: 1

In [16]:
# Gastric_issues
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Gastric_issues \
  --o-visualization qzv/beta-significance/weighted_unifrac-Gastric_issues-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-Gastric_issues-significance.qzv[0m
(qiime2) 

: 1

In [17]:
# Gastric_issues
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Gastric_issues \
  --o-visualization qzv/beta-significance/unweighted_unifrac-Gastric_issues-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-Gastric_issues-significance.qzv[0m
(qiime2) 

: 1

### DM

In [18]:
# DM
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column DM \
  --o-visualization qzv/beta-significance/jaccard-DM-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-DM-significance.qzv[0m
(qiime2) 

: 1

In [19]:
# DM
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column DM \
  --o-visualization qzv/beta-significance/bray_curtis-DM-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-DM-significance.qzv[0m
(qiime2) 

: 1

In [20]:
# DM
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column DM \
  --o-visualization qzv/beta-significance/weighted_unifrac-DM-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-DM-significance.qzv[0m
(qiime2) 

: 1

In [21]:
# DM
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column DM \
  --o-visualization qzv/beta-significance/unweighted_unifrac-DM-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-DM-significance.qzv[0m
(qiime2) 

: 1

### Arthritis

In [22]:
# Arthritis
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Arthritis \
  --o-visualization qzv/beta-significance/jaccard-Arthritis-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-Arthritis-significance.qzv[0m
(qiime2) 

: 1

In [23]:
# Arthritis
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Arthritis \
  --o-visualization qzv/beta-significance/bray_curtis-Arthritis-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-Arthritis-significance.qzv[0m
(qiime2) 

: 1

In [24]:
# Arthritis
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Arthritis \
  --o-visualization qzv/beta-significance/weighted_unifrac-Arthritis-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-Arthritis-significance.qzv[0m
(qiime2) 

: 1

In [25]:
# Arthritis
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Arthritis \
  --o-visualization qzv/beta-significance/unweighted_unifrac-Arthritis-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-Arthritis-significance.qzv[0m
(qiime2) 

: 1

### Sex

In [26]:
# Sex
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Sex \
  --o-visualization qzv/beta-significance/jaccard-Sex-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-Sex-significance.qzv[0m
(qiime2) 

: 1

In [27]:
# Sex
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Sex \
  --o-visualization qzv/beta-significance/bray_curtis-Sex-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-Sex-significance.qzv[0m
(qiime2) 

: 1

In [28]:
# Sex
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Sex \
  --o-visualization qzv/beta-significance/weighted_unifrac-Sex-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-Sex-significance.qzv[0m
(qiime2) 

: 1

In [29]:
# Sex
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Sex \
  --o-visualization qzv/beta-significance/unweighted_unifrac-Sex-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-Sex-significance.qzv[0m
(qiime2) 

: 1

### Control

In [30]:
# Control
# Jaccard
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/jaccard_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Control \
  --o-visualization qzv/beta-significance/jaccard-Control-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/jaccard-Control-significance.qzv[0m
(qiime2) 

: 1

In [31]:
# Control
# Bray-curtis
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/bray_curtis_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Control \
  --o-visualization qzv/beta-significance/bray_curtis-Control-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/bray_curtis-Control-significance.qzv[0m
(qiime2) 

: 1

In [32]:
# Control
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Control \
  --o-visualization qzv/beta-significance/weighted_unifrac-Control-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/weighted_unifrac-Control-significance.qzv[0m
(qiime2) 

: 1

In [33]:
# Control
# Weighted Unifrac
qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata_16s.tsv \
  --m-metadata-column Control \
  --o-visualization qzv/beta-significance/unweighted_unifrac-Control-significance.qzv \
  --p-pairwise

(qiime2) (qiime2) [32mSaved Visualization to: qzv/beta-significance/unweighted_unifrac-Control-significance.qzv[0m
(qiime2) 

: 1

In [34]:
qiime diversity beta-group-significance  --help

Usage: [34mqiime diversity beta-group-significance[0m [OPTIONS]

  Determine whether groups of samples are significantly different from one
  another using a permutation-based statistical test.

[1mInputs[0m:
  [34m[4m--i-distance-matrix[0m ARTIFACT
    [32mDistanceMatrix[0m     Matrix of distances between pairs of samples.
                                                                    [35m[required][0m
[1mParameters[0m:
  [34m[4m--m-metadata-file[0m METADATA
  [34m[4m--m-metadata-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                       Categorical sample metadata column.          [35m[required][0m
  [34m--p-method[0m TEXT [32mChoices('permanova', 'anosim', 'permdisp')[0m
                       The group significance test to be applied.
                                                        [35m[default: 'permanova'][0m
  [34m--p-pairwise[0m / [34m--p-no-pairwise[0m
                       Perform pairwise tests between all pairs

: 1

### Beta Rarefaction

In [2]:
qiime diversity beta-rarefaction \
    --i-table qza/table.qza \
    --p-metric weighted_unifrac \
    --p-clustering-method upgma \
    --m-metadata-file metadata_16s.tsv \
    --p-sampling-depth 34568 \
    --i-phylogeny qza/rooted-tree.qza \
    --o-visualization qzv/beta-rarefaction.qzv \
    --verbose

  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
[32mSaved Visualization to: qzv/beta-rarefaction.qzv[0m
(qiime2) 

: 1

In [3]:
qiime tools export --help

Usage: [34mqiime tools export[0m [OPTIONS]

  Exporting extracts (and optionally transforms) data stored inside an
  Artifact or Visualization. Note that Visualizations cannot be transformed
  with --output-format

[1mOptions[0m:
  [34m[4m--input-path[0m ARTIFACT/VISUALIZATION
                        Path to file that should be exported        [35m[required][0m
  [34m[4m--output-path[0m PATH    Path to file or directory where data should be
                        exported to                                 [35m[required][0m
  [34m--output-format[0m TEXT  Format which the data should be exported as. This
                        option cannot be used with Visualizations
  [34m--help[0m                Show this message and exit.
(qiime2) 

: 1

In [4]:
qiime tools export \
    --input-path qzv/beta-rarefaction.qzv \
    --output-path qzv/upgma-tree

[32mExported qzv/beta-rarefaction.qzv as Visualization to directory qzv/upgma-tree[0m
(qiime2) 

: 1

# Differential abundance testing with ANCOM
***

In [38]:
qiime composition ancom --citations

[33mNo citations found.[0m
(qiime2) 

: 1

### Perform differential abundance test at genus level

In [27]:
# collapse feature table at genus level(6)
qiime taxa collapse \
  --i-table qza/table.qza \
  --i-taxonomy qza/taxonomy.qza \
  --p-level 6 \
  --o-collapsed-table qza/table-l6.qza

[32mSaved FeatureTable[Frequency] to: qza/table-l6.qza[0m
[?2004h(qiime2) 

: 1

[Analysis of microbial compositions: a review of normalization and differential abundance analysis](https://www.nature.com/articles/s41522-020-00160-w) mentions that ANCOM and ANCOMBC fail to control FDR at sample sizes <10.
That's why we took min sample size 10.

In [28]:
# Filtering the very low abundance features
qiime feature-table filter-features \
  --i-table qza/table-l6.qza \
  --p-min-frequency 26 \
  --p-min-samples 10 \
  --o-filtered-table qza/filtered-feature-table-ancom.qza

[32mSaved FeatureTable[Frequency] to: qza/filtered-feature-table-ancom.qza[0m
[?2004h(qiime2) 

: 1

ANCOM operates on a `FeatureTable[Composition]` QIIME 2 artifact, which is based on frequencies of features on a per-sample basis, **but cannot tolerate frequencies of zero.** To build the composition artifact, a `FeatureTable[Frequency]` artifact must be provided to **add-pseudocount** (an imputation method), which will produce the `FeatureTable[Composition] artifact`.

In [29]:
# add pseudocount and produce the FeatureTable[Composition] artifact.
qiime composition add-pseudocount \
  --i-table qza/filtered-feature-table-ancom.qza \
  --o-composition-table qza/comp-filtered-feature-table-ancom.qza

[32mSaved FeatureTable[Composition] to: qza/comp-filtered-feature-table-ancom.qza[0m
[?2004h(qiime2) 

: 1

<font color = red>NB: In this case when I initially performed the **ANCOM** test on **Cohort** column, **No significant feature was found**. That's why I added Chakma - Non Chakma, Khiyan - Non Khiyan etc column to metadata (metadata_v2.tsv). This part was perfromed via R code. Please visit the **R_code.ipynb** to get the code.</font>

<font color = purple> Qiime's volcano plot is not very nice to present in paper. So, I will add them in a word file as supplementary file. Instead I will create comparable box plot for top abundant features. </font>

<font color = brown> **Data preparation for group boxplot from ANCOM result is discussed after this calculation. So go down below for detail data preparation procedure** </font>

In [42]:
# Differentially expressed feature in cohort
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Cohort \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Cohort.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-Cohort.qzv[0m
(qiime2) 

: 1

### Perform differential abundance based on Cohort

**<span class="burk"><span class="mark">Chakma</span></span>**

In [43]:
# Differentially expressed feature in chakma
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Chakma \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Chakma.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-Chakma.qzv[0m
(qiime2) 

: 1

<span class="mark">**Khiyan**</span>

In [44]:
# Differentially expressed feature in Khiyan
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Khiyan \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Khiyan.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-Khiyan.qzv[0m
(qiime2) 

: 1

**<span class="mark">Tripura</span>**

In [45]:
# Differentially expressed feature in Tripura
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Tripura \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Tripura.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-Tripura.qzv[0m
(qiime2) 

: 1

**<span class="mark">Marma</span>**

In [46]:
# Differentially expressed feature in Marma
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Marma \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Marma.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-Marma.qzv[0m
(qiime2) 

: 1

### Now, we will perform differential abundance test based on different <span class="girk">diseases</span>.

In [51]:
head metadata_v2.tsv

Sample ID	Sex	Age_group	Cohort	HTN	Gastric_issues	DM	Control	Arthritis	Chakma	Khiyan	Marma	Tripura
BC1	Female	Middle-aged	Chakma	Yes	No	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC10	Female	Middle-aged	Chakma	Yes	Yes	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC11	Female	Elder	Chakma	Yes	No	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC12	Female	Aged	Chakma	Yes	No	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC13	Male	Middle-aged	Chakma	No	No	No	Yes	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC14	Female	Elder	Chakma	Yes	Yes	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC15	Female	Aged	Chakma	No	No	No	No	Yes	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC2	Female	Middle-aged	Chakma	Yes	No	No	No	No	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
BC3	Female	Aged	Chakma	Yes	No	No	No	Yes	Chakma	Non-Khiyan	Non-Marma	Non-Tripura
[?2004h(qiime2) 

: 1

**<span class="mark">HTN</span>**

In [34]:
# Differentially expressed feature in HTN
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column HTN \
  --verbose \
  --o-visualization diffrential_abundance/ancom-HTN-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-HTN-2.qzv[0m
[?2004h(qiime2) 

: 1

**<span class="mark">Gastric_issues</span>**

In [35]:
# Differentially expressed feature in Gastric_issues
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Gastric_issues \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Gastric_issues-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-Gastric_issues-2.qzv[0m
[?2004h(qiime2) 

: 1

**<span class="mark">DM</span>**

In [36]:
# Differentially expressed feature in DM
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column DM \
  --verbose \
  --o-visualization diffrential_abundance/ancom-DM-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-DM-2.qzv[0m
[?2004h(qiime2) 

: 1

**<span class="mark">Arthritis</span>**

In [37]:
# Differentially expressed feature in Arthritis
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Arthritis \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Arthritis-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-Arthritis-2.qzv[0m
[?2004h(qiime2) 

: 1

**<span class="mark">Control</span>**

In [38]:
# Differentially expressed feature in Control
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Control \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Control-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-Control-2.qzv[0m
[?2004h(qiime2) 

: 1

### Now, we will perform differential abundance test based on Sex and Age_group

**<span class="girk">Based on Sex</span>**

In [39]:
# Differentially expressed feature in Sex
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Sex \
  --verbose \
  --o-visualization diffrential_abundance/ancom-Sex-2.qzv

[32mSaved Visualization to: diffrential_abundance/ancom-Sex-2.qzv[0m
[?2004h(qiime2) 

: 1

**<span class="mark">Based on Age_group</span>**

In [47]:
# Differentially expressed feature in Age_group
qiime composition ancom \
  --i-table qza/comp-filtered-feature-table-ancom.qza \
  --m-metadata-file metadata_v2.tsv \
  --m-metadata-column Age_group \
  --verbose \
  --o-visualization diffrential_abundance/ancom-age-group.qzv

(qiime2) [32mSaved Visualization to: diffrential_abundance/ancom-age-group.qzv[0m
(qiime2) 

: 1

### Data preparation for creating group boxplot in R

Before importing differential abundance data into R, first I will prepare the dataset.

When we visualize the ancom **visualization** file along with a volcano plot we can see **ANCOM statistical results** and **Percentile abundances of features by group** list also.

From **Percentile abundances of features by group** list we will take only those features whose **W value** was <font color = green>grater than zero </font> in **ANCOM statistical results** list. We will call these features as <font color = red> **top abundant features** </font> Then we will generate group boxplot as like following example.

1. First download **ANCOM statistical results** as tsv file, open it in **libreoffice calc**, from feature name column copy only the name of **top abundant features**. Then pase it in text file (let's say filename is **genus.txt**)

2. Add two blank line at the top of **genus.txt** file and write following two words; **Percentile** and **Group**. So that we can **grep** the header line also. We will use the names of this **genus** file as pattern to search **top abundant features** from ***Percentile abundances of features by group*** file.


 3. Download **Percentile abundances of features by group** file in tsv format give it a name (let's say **percentile_abundance.tsv**)

4. Then run the following code in the terminal:

    `grep -w -f genus.txt percentile_abundance.tsv > top_features.tsv`

5. This **top_features.tsv** contains percintile abundance of only **top features**. But the name of features contain full taxnomic lineage.

`eg.k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides`

This will create the plot looking ugly. So we will keep only the last taxa name of each features.

6. To do so, first copy the feature names in a txt file ( let's say **feature_names.txt**), then run the following code:

    `cat feature_names.txt | cut -d ";" -f 6 > genus_names.txt`

7. Then copy the names from **genus_names.txt** file and paste it in **top_features.tsv** file.

<font color = blue > Thus we will create initial dataset suitable for R code. </font>

<font color = red> But, we will also need to do more data transformation in R to create the suitable boxplot. See, **Differential Abundance Plot** section in **R_codes.ipynb** notebook.