# 07. Function Inference

Author: Minghang Li

In this Jupyter Notebook, we perform function inference using the `picrust2`
plugin based on amplicon-based sequencing results.

<div style="background-color: lightsalmon; padding: 10px;">
    
**NOTE**: `q2-picrust2` plugin is not compatible with `qiime2-2024.10`
</div>



**Notebook overview**<br>
[1. Setup](#setup)<br>
[2. Run `picrust2` full pipeline](#full_pipeline)<br>
[3. Visualization](#visualization)<br>
[4. Thoughts and discussion](#discussion)<br>

## 1. Setup

In [1]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [2]:
# get project root by finding .git folder
root = !git rev-parse --show-toplevel
root = root[0]

# assigning variables throughout the notebook
raw_data_dir = os.path.join(root, "data/raw")
data_dir = os.path.join(root, "data/processed")
vis_dir  = os.path.join(root, "results")

## 2. Run `picrust2` full pipeline

PICRUSt2 (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) is a software for predicting functional abundances based only on marker gene (here, it's 16S rRNA) sequences.

"Function" here refers to gene families such as KEGG orthologs (KO), Enzyme Classification (EC) numbers, and metabolic (MetaCyc) pathways.

Here we use **maximum parsimony** (`mp`) for hidden state predcition (HSP) and **SEPP** (SATé-Enabled Phylogenetic Placement) method for tree placement (cannot use `EPA-NG` due to limited RAM + trying to be consistent with `q2-feature-insertion`). `--p-edge-exponent` was set to be `0` for the pipeline to run successfully (as SEPP handles branch weighing internally)

<div style="background-color: aliceblue; padding: 10px;">

**NOTE** `qiime picrust2 custom-tree-pipeline` can only use trees built from `aiime fragment-insertion sepp` as an input. So the tree built in `03_phylogeny.ipynb` cannot be used.
</div>

In [9]:
! qiime picrust2 full-pipeline \
    --i-table $data_dir/table-filtered.qza \
    --i-seq $data_dir/rep-seqs-filtered.qza \
    --p-threads 4 \
    --p-hsp-method mp \
    --p-edge-exponent 0 \
    --p-placement-tool sepp \
    --output-dir $data_dir/q2-picrust2_fullpipeline \
    --verbose


This is the set of poorly aligned input sequences to be excluded: 6a4eaf1e947a173484f8ef3bff28907b, e42dfd4ac46119aac2bfb17fbaae27b8, 84c245c44938522d80bfe6f6345341ae, ad3c31a48752018b8466c44c29505c5e





All ASVs were below the max NSTI cut-off of 2.0 and so all were retained for downstream analyses.

All ASVs were below the max NSTI cut-off of 2.0 and so all were retained for downstream analyses.


[32mSaved FeatureTable[Frequency] to: q2-picrust2_fullpipeline/ko_metagenome.qza[0m
[32mSaved FeatureTable[Frequency] to: q2-picrust2_fullpipeline/ec_metagenome.qza[0m
[32mSaved FeatureTable[Frequency] to: q2-picrust2_fullpipeline/pathway_abundance.qza[0m
[0m

## 3. Visualization


In [33]:
picrust_res = f"{data_dir}/q2-picrust2_fullpipeline"

### 3.1 Summarize the table

Summarize the information using `feature-table summarize`

In [27]:
! qiime feature-table summarize \
   --i-table $picrust_res/ko_metagenome.qza \
   --o-visualization $picrust_res/ko_metagenome.qzv

! qiime feature-table summarize \
   --i-table $picrust_res/ec_metagenome.qza \
   --o-visualization $picrust_res/ec_metagenome.qzv

! qiime feature-table summarize \
   --i-table $picrust_res/pathway_abundance.qza \
   --o-visualization $picrust_res/pathway_abundance.qzv

[32mSaved Visualization to: /home/jovyan/project/alien/data/processed/q2-picrust2_fullpipeline/ko_metagenome.qzv[0m
[0m[32mSaved Visualization to: /home/jovyan/project/alien/data/processed/q2-picrust2_fullpipeline/ec_metagenome.qzv[0m
[0m[32mSaved Visualization to: /home/jovyan/project/alien/data/processed/q2-picrust2_fullpipeline/pathway_abundance.qzv[0m
[0m

In [34]:
Visualization.load(f"{picrust_res}/ko_metagenome.qzv")

In [29]:
Visualization.load(f"{picrust_res}/ec_metagenome.qzv")

In [30]:
Visualization.load(f"{picrust_res}/pathway_abundance.qzv")

### 3.2 Compute diversity

#### 3.2.1 KEGG

In [None]:
! qiime diversity alpha-rarefaction \
    --i-table $picrust_res/ko_metagenome.qza \
    --p-max-depth 30000 \
    --m-metadata-file $data_dir/metadata.tsv \
    --o-visualization $data_dir/ko_metagenome/alpha-rarefaction-ko.qzv

In [None]:
! qiime diversity core-metrics \
   --i-table $picrust_res/ko_metagenome.qza \
   --p-sampling-depth ? \
   --m-metadata-file $data_dir/metadata.tsv \
   --output-dir ko_metagenome_core_metrics \
   --p-n-jobs 4

#### 3.2.2 EC counts

In [None]:
! qiime diversity alpha-rarefaction \
    --i-table $picrust_res/ec_metagenome.qza \
    --p-max-depth 10000 \
    --m-metadata-file $data_dir/metadata.tsv \
    --o-visualization $data_dir/alpha-rarefaction-ec.qzv

In [None]:
! qiime diversity core-metrics \
   --i-table $picrust_res/ec_metagenome.qza \
   --p-sampling-depth ? \
   --m-metadata-file $data_dir/metadata.tsv \
   --output-dir ec_metagenome_core_metrics \
   --p-n-jobs 4

#### 3.2.3 Path abundance

In [None]:
! qiime diversity alpha-rarefaction \
    --i-table $picrust_res/pathway_abundance.qza \
    --p-max-depth 5000 \
    --m-metadata-file $data_dir/metadata.tsv \
    --o-visualization $data_dir/alpha-rarefaction-pathabund.qzv

In [None]:
! qiime diversity core-metrics \
   --i-table $picrust_res/pathway_abundance.qza \
   --p-sampling-depth ? \
   --m-metadata-file $data_dir/metadata.tsv \
   --output-dir pathway_abundance_core_metrics \
   --p-n-jobs 4

## 4. Thoughts and Discussion

Although amplicon-based predictions may be highly correlated with functional profiles based on shotgun metagenomics sequencing data, differential abundance results will likely differ substantially from what would be found based on shotgun metagenomics data. As we requested data from our TA, it is important to compare the results generated from shotgun metagenomics analysis with the prediction from picrust.

## Bibliography

[1] G. M. Douglas et al., “PICRUSt2 for prediction of metagenome functions,” Nature Biotechnology, vol. 38, no. 6, pp. 685–688, Jun. 2020, doi: https://doi.org/10.1038/s41587-020-0548-6.

‌[2] S. Purushothaman, M. Meola, and A. Egli, “Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics,” International Journal of Molecular Sciences, vol. 23, no. 17, p. 9834, Aug. 2022, doi: https://doi.org/10.3390/ijms23179834.
‌