# Functional Prediction

You can learn more about PICRUSt2 on its [GitHub wiki](https://github.com/picrust/picrust2/wiki), in [this tutorial](https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial) and the [Nature Biotechnology article](https://doi.org/10.1038/s41587-020-0548-6).

<a id='setup'></a>
## 0. Setup

In [1]:
import os
import pandas as pd
import qiime2 as q2
import requests

from qiime2 import Visualization

data_dir = 'shared_data/'
    
%matplotlib inline

In [2]:
def fetch_ipath(ids: list, img_output_path: str, verbose: bool = False):
    """Fetches a enriched pathways map from iPATH3 for given IDs."""
    url = 'https://pathways.embl.de/mapping.cgi'
    
    # remove colon from EC names
    if ':' in ids[0]:
        ids = [x.replace(':', '') for x in ids]
    
    if verbose:
        print(f'Fetching iPATH3 diagram for ids: {ids}')
    params = {
        'default_opacity': 0.6,
        'export_type': 'svg',
        'selection': '\n'.join(ids)
    }   
    response = requests.get(url=url, params=params)
    
    with open(img_output_path, 'wb') as img:
        img.write(response.content)

In [3]:
# path to the picrust2 conda environment - do not change!
picrust_env = '/opt/conda/envs/picrust2/bin'

<a id='picrust'></a>
## 1. Functional inference

Use Picrust 2 to simulate metagenome data from our dataset. 


In [4]:
%%script env picrust_env="$picrust_env" data_dir="$data_dir" bash

# append the env location to PATH so that qiime
# can find all required executables
export PATH=$picrust_env:$PATH

$picrust_env/qiime picrust2 full-pipeline \
    --i-seq $data_dir/Denoising/dada2_rep_set.qza \
    --i-table $data_dir/Denoising/dada2_table.qza \
    --output-dir $data_dir/picrust2_results \
    --p-placement-tool sepp \
    --p-threads 2 \
    --p-hsp-method pic \
    --p-max-nsti 2 

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.


Saved FeatureTable[Frequency] to: shared_data//picrust2_results/ko_metagenome.qza
Saved FeatureTable[Frequency] to: shared_data//picrust2_results/ec_metagenome.qza
Saved FeatureTable[Frequency] to: shared_data//picrust2_results/pathway_abundance.qza


In [5]:
! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where "[alleged_abduction]='0'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where  "[alleged_abduction]='0'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where  "[alleged_abduction]='0'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_abducted.qza

[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.[0m
[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/ko_metagenome_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/ec_metagenome_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/pathway_abundance_abducted.qza[0m
[0m

Now, we can read in all three artifacts using QIIME 2 Python API - we can view them as DataFrames:

In [4]:
ko_a = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome_abducted.qza').view(pd.DataFrame)
ec_a = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome_abducted.qza').view(pd.DataFrame)
pa_a = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance_abducted.qza').view(pd.DataFrame)

Let's briefly examine the contents of each of those tables:

In [5]:
ko_a.head(1)

Unnamed: 0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
0KB68F,1361.400386,22.47605,6638.807061,1283.583009,2819.624601,0.0,12651.473549,4992.0463,5519.033478,3.783984e-21,...,0.024526,0.0,0.0,0.0,609.994359,0.0,0.0,0.0,0.010882,0.00029


In [6]:
ec_a.head(1)

Unnamed: 0,EC:1.1.1.1,EC:1.1.1.10,EC:1.1.1.100,EC:1.1.1.101,EC:1.1.1.102,EC:1.1.1.103,EC:1.1.1.105,EC:1.1.1.107,EC:1.1.1.108,EC:1.1.1.11,...,EC:6.4.1.8,EC:6.5.1.1,EC:6.5.1.2,EC:6.5.1.3,EC:6.5.1.4,EC:6.5.1.5,EC:6.5.1.6,EC:6.5.1.7,EC:6.6.1.1,EC:6.6.1.2
0KB68F,8873.981782,0.001547,11849.507136,1.5720369999999998e-19,0.034769,958.574356,2.834738e-11,0.054358,34.422218,0.0,...,3.210485,1998.437665,6643.33606,266.921716,9.72331e-13,0.0,248.185353,248.185353,170.785566,111.015333


In [7]:
pa_a.head(1)

Unnamed: 0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THISYN-PWY,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY
0KB68F,5609.428252,56.803589,0.707068,0.0,6474.200627,8260.825591,981.9506,1e-05,1187.378324,5202.509647,...,2405.565906,42.45194,5210.690347,5785.023943,4657.981644,106.453291,40.903373,5043.6082,2.866319,7555.79568


In [10]:
! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where "[alleged_abduction]='1'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_not_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where  "[alleged_abduction]='1'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_not_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --p-where  "[alleged_abduction]='1'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_not_abducted.qza

[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/ko_metagenome_not_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/ec_metagenome_not_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: shared_data//picrust2_results/pathway_abundance_not_abducted.qza[0m
[0m

In [8]:
ko_na = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome_not_abducted.qza').view(pd.DataFrame)
ec_na = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome_not_abducted.qza').view(pd.DataFrame)
pa_na = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance_not_abducted.qza').view(pd.DataFrame)

In [9]:
ko_na.head(1)

Unnamed: 0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
0DOSLC,54182.069959,6.981562,114120.68968,72354.209497,45604.268712,10.652446,79362.866673,59942.895952,33794.194597,5e-06,...,25.698715,37.359132,22.442708,5.201112,19.893207,27133.128495,0.007928,0.0,37.851223,0.0


In [10]:
ec_na.head(1)

Unnamed: 0,EC:1.1.1.1,EC:1.1.1.10,EC:1.1.1.100,EC:1.1.1.101,EC:1.1.1.102,EC:1.1.1.103,EC:1.1.1.105,EC:1.1.1.107,EC:1.1.1.108,EC:1.1.1.11,...,EC:6.4.1.8,EC:6.5.1.1,EC:6.5.1.2,EC:6.5.1.3,EC:6.5.1.4,EC:6.5.1.5,EC:6.5.1.6,EC:6.5.1.7,EC:6.6.1.1,EC:6.6.1.2
0DOSLC,279428.793373,1.404367e-09,333782.8077,3.7427060000000004e-17,0.064341,624.524986,2.125909,0.112491,7238.304061,10.652446,...,3.295559,2392.831915,136766.045596,6958.754988,34.220674,0.002587,941.460175,941.460175,1173.472589,633.545709


In [11]:
pa_na.head(1)

Unnamed: 0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THISYN-PWY,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY
0DOSLC,63771.529103,3814.332432,269.19087,364.005719,155715.806612,154301.035122,13280.963268,61.024873,91538.094937,81425.820836,...,4605.343494,906.927598,124580.518839,37532.251923,83719.081706,346.681116,169.56297,136811.33978,827.124515,133300.743599


You can see that they look just like the other feature tables we worked before with. The difference is that now they do not contain information about ASVs but about different levels of the functional profiles:

1. `ko` table: columns represent KEGG orthologs, as indicated by their names (e.g., **K**19777)
2. `ec` table: columns represent enzymes, as indicated by the Enzyme Commission numbers (e.g., **EC**:1.1.1.108)
3. `pa` table: columns represent entire pathways using the MetaCyc classification (e.g., ANAGLYCOLYSIS-PWY)

<a id='ipath'></a>
### 1.1 Enriched KEGG orthologs visualization

We start by merging our feature table with the treatment column (`disease`) from the metadata:

In [12]:
metadata = pd.read_csv(f'{data_dir}/metadata/sample_metadata.tsv', sep='\t', header=0, index_col=0)

In [13]:
ko_a_meta = ko_a.merge(metadata[['disease']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['disease']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['disease']], left_index=True, right_index=True)

In [27]:
ko_na_meta = ko_na.merge(metadata[['disease']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['disease']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['disease']], left_index=True, right_index=True)
pa_na_meta

Unnamed: 0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY,disease
0DOSLC,63771.529103,3814.332432,269.19087,364.005719,155715.806612,154301.035122,13280.963268,61.024873,91538.094937,81425.820836,...,906.927598,124580.518839,37532.251923,83719.081706,346.6811,169.56297,136811.33978,827.124515,133300.743599,Leukemia
0WG67D,52492.246729,2795.283692,105.305417,338.491398,113991.857189,112345.430911,12578.28906,30.875975,60516.674666,56468.06064,...,684.908222,90862.382544,60499.276466,59714.134335,981.6188,352.86752,98470.856591,629.269301,95092.619082,Leukemia
12H3HH,1566.689152,110.771443,0.76753,34.983111,3552.142687,3526.986493,357.652313,7.425243,1856.630077,1681.829836,...,40.238435,2793.299811,1712.015878,1674.88876,89.38007,12.636895,3051.311752,20.697951,3015.67706,Non-Hodgkin's Lymphoma
12QINS,2060.699133,175.132817,11.689512,57.304979,5544.819708,5496.283499,354.548907,24.761137,2892.58927,2497.139474,...,67.339397,4290.651443,1776.600557,2395.907599,2.191305,19.414488,4746.855952,35.427232,4264.080373,Leukemia
12ZW3B,3868.513776,425.091009,1.102942,38.113477,6545.144276,6729.880382,644.145772,9.661871,3444.566857,3374.395233,...,47.244421,4803.305543,3774.292469,2948.429096,81.50358,19.437793,5362.34811,35.353132,6197.975063,Leukemia
2I7SIQ,19394.299131,124.559846,0.908484,48.410651,19322.252386,27949.875576,711.345531,11.195332,1806.32288,19968.557918,...,82.546401,19703.582448,21117.382137,10994.016597,65.68585,16.604471,12371.75571,7.279785,24151.867648,Leukemia
3ALGNE,72163.99012,6924.010171,25.053797,1375.161327,147792.962064,146559.785462,24148.140499,203.768629,83572.429933,74357.220212,...,8139.067716,119627.861507,92460.408598,70873.290765,8975.149,681.628084,128839.670201,833.387263,136162.915136,Leukemia
3XK7UB,151843.310887,31.047374,0.182697,227.161183,112025.255066,237342.722003,33551.080042,6.582473,8950.105125,145502.869853,...,141.270223,152388.168749,161176.774858,126910.533547,1891.976,2156.938144,150424.571548,0.298001,205174.804929,Non-Hodgkin's Lymphoma
4O2DTB,42562.792364,1159.286797,13.526408,135.525619,56277.9759,60056.338809,2750.558217,27.523162,23483.176504,40299.157407,...,292.486264,51198.201852,12236.252544,39679.01439,19.90093,57.912956,54505.856787,261.04349,66423.855353,Leukemia
5Y49IM,226.034828,0.0,0.0,0.0,163.591971,300.307783,0.0,0.0,88.830622,208.889126,...,0.0,219.624662,214.85058,177.28319,0.0,0.0,32.583888,0.0,274.984545,Leukemia


Next, we will calculate an average abundance of each KO, EC and pathway in each group (treatment vs. no treatment):

In [15]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('disease').mean()
ec_a_meta_avg = ec_a_meta.groupby('disease').mean()
pa_a_meta_avg = pa_a_meta.groupby('disease').mean()
ko_na_meta_avg = ko_na_meta.groupby('disease').mean()
ec_na_meta_avg = ec_na_meta.groupby('disease').mean()
pa_na_meta_avg = pa_na_meta.groupby('disease').mean()

In [16]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,19028.235594,1065.362017,34387.09214,5062.437658,14231.673873,0.909434,41327.780462,8815.128883,16317.379247,0.002319,...,357.612024,374.245338,82.476389,357.136535,1636.803313,0.0,0.0,0.0,764.903179,0.004102
Leukemia,17227.619162,423.810503,16639.077543,2499.003063,6982.46835,21.041076,18481.732069,11486.349201,9974.081115,0.003699,...,38.012421,371.852232,8.854783,35.672199,5126.323039,0.000112,0.000287,1.960334e-65,190.707274,0.002403
Myelodysplastic Syndromes,13056.800498,1193.270974,14331.828948,360.672058,22843.442305,0.0,28629.95561,3001.358,8689.148528,0.10126,...,0.0,0.180566,0.0,0.0,220.871235,0.0,0.0,0.0,1450.629391,0.0
Non-Hodgkin's Lymphoma,11439.37848,578.621286,26721.551141,3549.133715,21085.984097,1.041146,40508.693863,26224.881159,12625.656417,0.048428,...,352.133688,354.08343,87.898479,351.837638,4058.66147,0.00013,0.0,8.615288e-62,505.411611,0.001922


In [17]:
ec_a_meta_avg

Unnamed: 0_level_0,EC:1.1.1.1,EC:1.1.1.10,EC:1.1.1.100,EC:1.1.1.101,EC:1.1.1.102,EC:1.1.1.103,EC:1.1.1.105,EC:1.1.1.107,EC:1.1.1.108,EC:1.1.1.11,...,EC:6.4.1.8,EC:6.5.1.1,EC:6.5.1.2,EC:6.5.1.3,EC:6.5.1.4,EC:6.5.1.5,EC:6.5.1.6,EC:6.5.1.7,EC:6.6.1.1,EC:6.6.1.2
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,59201.622706,1.936135,84269.563776,1.420883e-19,0.492604,9563.47267,4.016185e-10,1.398176,204.295806,0.909434,...,30.085516,6494.472966,45453.742876,4127.30498,371.255156,4.711116e-93,1369.242137,1369.242137,3375.390229,2979.604866
Leukemia,42676.254918,1.48646,59547.867017,6.923547999999999e-19,4.797126,1508.77492,0.4480973,0.850152,201.423154,21.041076,...,4.553,3345.707116,24506.12636,2459.493577,362.636418,6.425743e-05,510.747908,510.747908,1154.913759,1857.370608
Myelodysplastic Syndromes,21570.018224,0.930931,76914.138717,6.164715e-21,1e-05,7444.313042,0.0,0.361765,3975.211256,0.0,...,0.0,3237.27934,34037.845491,3234.30545,0.641981,0.0,338.171783,338.171783,6838.272213,12205.136175
Non-Hodgkin's Lymphoma,47317.641787,3.671824,73667.712361,9.768579999999999e-19,0.132309,3453.161945,0.08812602,2.626358,1960.874303,1.041146,...,98.21908,8831.497716,34687.161603,3556.853184,454.37882,5.192716e-05,1349.660664,1349.660664,5263.938253,4059.135492


In [18]:
pa_a_meta_avg

Unnamed: 0_level_0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THISYN-PWY,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,31162.419166,236.455288,5.834883,1765.841533,28808.684822,41706.15374,5869.759793,691.860993,5122.506043,31287.6296,...,17824.192086,350.323012,30792.296334,32061.643846,29981.198004,654.706874,1455.22073,18286.029379,161.726056,44546.328173
Leukemia,14474.596924,228.825864,98.334934,690.832831,22961.374273,26294.150322,3087.242212,261.65823,4693.925958,14788.007768,...,6551.055647,406.654119,17005.916606,15561.742897,13460.415022,241.475892,1396.212503,18221.914383,69.064722,20282.531193
Myelodysplastic Syndromes,27539.507557,10.19969,2.598471,100.701889,28461.994549,37301.772122,6832.735976,6.832042,5473.324694,21181.456474,...,21293.844391,30.95701,30566.034246,28779.619168,19231.325066,212.058219,642.385398,16477.394555,0.138583,35292.947268
Non-Hodgkin's Lymphoma,23344.962993,255.005188,7.088741,1103.823229,28260.608367,35310.414158,6211.704743,566.729025,7420.252549,23448.637803,...,8995.458905,678.610439,27794.143991,25098.281772,18179.796477,502.104125,949.595462,24969.884756,30.244374,31692.309386


In [19]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,34953.4687,2.55238,41150.899491,4825.747676,6031.352334,2.987571,2319.676886,1318.989544,6554.63751,1.033848e-21,...,516.795292,60.329399,242.171153,14.429723,57.305988,1022.062706,0.0,0.0,1147.702727,0.42267
Leukemia,12857.970171,87.614986,36454.100688,25317.228261,17271.679642,12.264023,26162.447626,23579.801853,13815.782371,0.0007105095,...,1459.920138,671.360458,696.323734,146.345722,601.635315,12723.246532,0.010643,3.690182e-59,1491.308203,0.000295
Myelodysplastic Syndromes,1994.688516,0.0,6361.559141,0.374649,3321.432803,0.0,19096.681963,25235.315634,8.007399,0.0,...,0.0,0.0,0.0,0.0,0.0,3297.361743,0.0,0.0,40.44593,0.0
Non-Hodgkin's Lymphoma,16468.199466,76.265046,41743.571134,4283.070794,15978.635289,0.0,53992.016472,63361.706906,1237.352134,1.9271329999999998e-21,...,2.694251,2.572491,3.047133,0.639329,2.559274,7232.517849,0.0,0.0,147.29541,0.006617


In [20]:
ec_na_meta_avg

Unnamed: 0_level_0,EC:1.1.1.1,EC:1.1.1.10,EC:1.1.1.100,EC:1.1.1.101,EC:1.1.1.102,EC:1.1.1.103,EC:1.1.1.105,EC:1.1.1.107,EC:1.1.1.108,EC:1.1.1.11,...,EC:6.4.1.8,EC:6.5.1.1,EC:6.5.1.2,EC:6.5.1.3,EC:6.5.1.4,EC:6.5.1.5,EC:6.5.1.6,EC:6.5.1.7,EC:6.6.1.1,EC:6.6.1.2
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,231499.446004,2e-06,53734.031499,4.042741e-20,1.012892,1244.27278,0.04558188,41.374385,111.70763,2.987571,...,21.85233,1718.308313,43140.181236,2953.253153,60.367298,3.3583439999999996e-124,359.114744,359.114744,1102.277545,1131.022601
Leukemia,82104.480065,0.27112,124752.655966,1.0006400000000001e-17,0.333364,2603.384905,0.2769499,0.086174,4841.721009,12.264023,...,1.874056,7091.477817,44897.114723,9195.087383,3605.289367,0.008241254,747.180949,747.180949,6465.561181,7710.808756
Myelodysplastic Syndromes,3918.127866,0.0,9123.134872,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,3291.16,6431.549308,0.0,0.0,0.0,0.0,0.0,240.10436,198.473166
Non-Hodgkin's Lymphoma,87493.46437,0.550642,73266.07786,2.8410709999999998e-19,0.017707,3856.331836,1.443694e-11,0.24562,91.187411,0.0,...,13.319997,12216.842675,42925.781427,295.685416,4.284694,0.0,294.224634,294.224634,664.838419,482.420288


In [21]:
pa_na_meta_avg

Unnamed: 0_level_0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THISYN-PWY,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,42485.462649,864.096703,459.858644,3086.112382,16521.682461,44396.42784,2411.114775,1637.173916,2216.609938,43174.928143,...,4788.809612,1372.54673,42802.903098,28873.829793,46079.973418,536.934202,1363.540412,42289.663817,0.743029,58880.829111
Leukemia,29240.267115,6260.231204,822.717498,5513.193568,46508.56544,48800.519429,11017.178063,2465.368221,25652.647536,29915.998824,...,10236.472859,2946.600518,41146.338823,29254.868565,28071.626259,4261.056857,5609.540358,42331.36625,217.641211,44847.75365
Myelodysplastic Syndromes,3664.39311,0.0,0.0,0.0,5598.846816,8351.513944,0.0,0.0,28.119596,5051.817137,...,161.837161,0.0,6108.471542,6047.609136,3954.873567,0.0,0.0,6531.384339,0.0,9099.813351
Non-Hodgkin's Lymphoma,42112.848363,36.017001,0.242872,68.961719,23851.1864,59788.860836,6819.374557,6.207856,2307.334787,41111.530388,...,10085.182092,43.019198,42527.89724,33097.505587,38342.722224,398.477999,444.067383,42166.907824,5.414037,57810.029458


### 1.2 Enriched pathways



In [23]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_abund.qza

[32mSaved FeatureTable[Composition] to: shared_data//picrust2_results/pathway_abundance_abund.qza[0m
[0m

In [25]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abund.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --m-metadata-column alleged_abduction \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted.qzv

[32mSaved Visualization to: shared_data//pa_ancom_abducted.qzv[0m
[0m

In [26]:
Visualization.load(f'{data_dir}/pa_ancom_abducted.qzv')

In [28]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza

! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza

[32mSaved FeatureTable[Composition] to: shared_data//picrust2_results/pathway_abundance_abducted_abund.qza[0m
[0m[32mSaved FeatureTable[Composition] to: shared_data//picrust2_results/pathway_abundance_not_abducted_abund.qza[0m
[0m

In [None]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column disease \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted_disease.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column disease \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_not_abducted_disease.qzv

In [36]:
Visualization.load(f'{data_dir}/pa_ancom_abducted_disease.qzv')


In [37]:
Visualization.load(f'{data_dir}/pa_ancom_not_abducted_disease.qzv')

## 2. Functional Inference on Stool Consistency

In [39]:
metadata = pd.read_csv(f'{data_dir}/metadata/str_metadata.tsv', sep='\t', header=0, index_col=0)

In [41]:
ko_a_meta = ko_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ko_na_meta = ko_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)


In [42]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('stool_consistency').mean()
ec_a_meta_avg = ec_a_meta.groupby('stool_consistency').mean()
pa_a_meta_avg = pa_a_meta.groupby('stool_consistency').mean()
ko_na_meta_avg = ko_na_meta.groupby('stool_consistency').mean()
ec_na_meta_avg = ec_na_meta.groupby('stool_consistency').mean()
pa_na_meta_avg = pa_na_meta.groupby('stool_consistency').mean()

In [43]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
stool_consistency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
formed,6037.30679,773.65638,20189.709305,2051.262596,11894.709954,0.817587,29769.856206,16085.548045,12866.066883,0.040517,...,343.760265,346.83513,84.681018,343.471666,2282.388267,0.0,0.0,0.0,513.854295,0.000347
liquid,18708.284957,529.118973,28660.095238,2850.176466,21045.077804,17.110777,43873.721815,24820.053589,13028.297977,0.020591,...,15.9071,561.045843,3.651652,15.008893,2734.697251,0.0003854027,0.000478,3.267224e-65,341.382649,0.003919
semi-formed,27573.87634,190.813369,19447.748558,5223.539693,10406.258055,19.993414,15489.318023,10327.139551,8483.660044,5.8e-05,...,70.334807,85.938083,16.743543,66.983448,9193.116584,7.763663e-132,0.0,1.435881e-61,273.870191,0.003988


In [44]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
stool_consistency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
formed,10147.00615,38.277427,28631.098128,15020.481029,18270.32595,24.834485,22120.136212,22564.353026,14162.836028,1.333052e-21,...,3313.347149,1577.393486,1586.251307,343.85876,1414.225607,9928.389741,0.0,0.0,3350.177255,0.000102
liquid,17736.375188,93.757042,50932.307112,27855.87567,18077.871448,1.568945,49075.026567,48127.915567,9740.345325,0.001499965,...,130.917774,10.554224,50.823766,2.131272,8.428592,15485.064326,0.022468,7.790385e-59,243.304554,0.003744
semi-formed,13643.017988,99.836616,29046.519817,15155.787109,12060.491574,2.578626,18869.624145,19745.890068,8253.719387,6.1928999999999996e-21,...,64.856889,12.767183,37.792601,3.127061,12.390128,7047.888993,0.0,0.0,140.924726,0.047428


In [45]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column stool_consistency \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted_stool_consistency.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column stool_consistency \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_not_abducted_stool_consistency.qzv

[32mSaved Visualization to: shared_data//pa_ancom_abducted_stool_consistency.qzv[0m
[0m[32mSaved Visualization to: shared_data//pa_ancom_not_abducted_stool_consistency.qzv[0m
[0m

In [46]:
Visualization.load(f'{data_dir}/pa_ancom_abducted_stool_consistency.qzv')

In [47]:
Visualization.load(f'{data_dir}/pa_ancom_not_abducted_stool_consistency.qzv')

## 3. Functional Inference on Categorical categorical Time Relative to Engraftment


In [48]:
ko_a_meta = ko_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ko_na_meta = ko_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)

In [49]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
ec_a_meta_avg = ec_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
pa_a_meta_avg = pa_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
ko_na_meta_avg = ko_na_meta.groupby('categorical_time_relative_to_engraftment').mean()
ec_na_meta_avg = ec_na_meta.groupby('categorical_time_relative_to_engraftment').mean()
pa_na_meta_avg = pa_na_meta.groupby('categorical_time_relative_to_engraftment').mean()

In [50]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
categorical_time_relative_to_engraftment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
peri,2114.150651,335.571804,6704.060157,648.476811,3622.080928,1.320137,12160.84125,7810.594355,5321.334654,1.1e-05,...,31.153772,46.877726,7.337195,29.352181,932.024107,0.0,0.0,0.0,81.919162,9.7e-05
post,38891.56807,1018.57399,48330.342573,7486.202129,32076.200421,29.300452,62746.992648,40142.072759,22515.633123,0.077798,...,556.583222,1015.152726,136.602004,554.0172,11353.594292,0.000323,0.0004,1.102229e-61,1114.385819,0.006685
pre,3349.070585,360.71458,10606.952331,1018.165276,5988.067033,0.59576,14760.258998,5155.612975,6561.072315,0.00112,...,14.079083,17.604554,3.483824,13.948624,614.715979,0.0,0.0,0.0,78.476669,8.2e-05


In [51]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
categorical_time_relative_to_engraftment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
peri,1071.346104,152.46512,8949.881683,2133.061097,4159.275767,0.0,20253.321044,9911.320896,7723.821127,1.03215e-21,...,2.666667,2.673357,2.666667,0.665968,2.66591,1445.448955,0.0,0.0,2.673435,7.9e-05
post,20671.457512,71.474948,50965.499933,28947.815703,21931.421817,12.972971,37881.680954,41307.699574,12875.523434,0.0007940989,...,1661.409336,754.177675,792.338813,164.482187,676.069153,15965.293801,0.011895,4.124322e-59,1776.659257,0.027125
pre,1493.628627,62.682144,9265.847245,1459.460499,5349.312766,2.577253,13952.401125,9902.369913,5532.318145,0.0,...,2.798408,0.0,1.632991,0.0,0.0,1079.559387,0.0,0.0,8.042315,0.0


In [52]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column categorical_time_relative_to_engraftment \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted_categorical_time.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column categorical_time_relative_to_engraftment \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_not_abducted_categorical_time.qzv

[32mSaved Visualization to: shared_data//pa_ancom_abducted_categorical_time.qzv[0m
[0m[32mSaved Visualization to: shared_data//pa_ancom_not_abducted_categorical_time.qzv[0m
[0m

In [53]:
Visualization.load(f'{data_dir}/pa_ancom_abducted_categorical_time.qzv')

In [54]:
Visualization.load(f'{data_dir}/pa_ancom_not_abducted_categorical_time.qzv')