# Functional Prediction

You can learn more about PICRUSt2 on its [GitHub wiki](https://github.com/picrust/picrust2/wiki), in [this tutorial](https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial) and the [Nature Biotechnology article](https://doi.org/10.1038/s41587-020-0548-6).

<a id='setup'></a>
## 0. Setup

In [2]:
import os
import pandas as pd
import qiime2 as q2
import requests

from qiime2 import Visualization

data_dir = '../data/'
    
%matplotlib inline

In [3]:
def fetch_ipath(ids: list, img_output_path: str, verbose: bool = False):
    """Fetches a enriched pathways map from iPATH3 for given IDs."""
    url = 'https://pathways.embl.de/mapping.cgi'
    
    # remove colon from EC names
    if ':' in ids[0]:
        ids = [x.replace(':', '') for x in ids]
    
    if verbose:
        print(f'Fetching iPATH3 diagram for ids: {ids}')
    params = {
        'default_opacity': 0.6,
        'export_type': 'svg',
        'selection': '\n'.join(ids)
    }   
    response = requests.get(url=url, params=params)
    
    with open(img_output_path, 'wb') as img:
        img.write(response.content)

In [4]:
# path to the picrust2 conda environment - do not change!
picrust_env = '/opt/conda/envs/picrust2/bin'

<a id='picrust'></a>
## 1. Functional inference

Use Picrust 2 to simulate metagenome data from our dataset. 


In [4]:
%%script env picrust_env="$picrust_env" data_dir="$data_dir" bash

# append the env location to PATH so that qiime
# can find all required executables
export PATH=$picrust_env:$PATH

$picrust_env/qiime picrust2 full-pipeline \
    --i-seq $data_dir/denoising/dada2_rep_set.qza \
    --i-table $data_dir/denoising/dada2_table.qza \
    --output-dir $data_dir/picrust2_results \
    --p-placement-tool sepp \
    --p-threads 2 \
    --p-hsp-method pic \
    --p-max-nsti 2 

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.


Saved FeatureTable[Frequency] to: shared_data//picrust2_results/ko_metagenome.qza
Saved FeatureTable[Frequency] to: shared_data//picrust2_results/ec_metagenome.qza
Saved FeatureTable[Frequency] to: shared_data//picrust2_results/pathway_abundance.qza


## 2. Analyzing the Effect of Abduction

Here, we want to find the difference of functional pathways between abducted and not abducted patients.

In [8]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_abund.qza

[32mSaved FeatureTable[Composition] to: ../data//picrust2_results/pathway_abundance_abund.qza[0m
[0m

In [9]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abund.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --m-metadata-column alleged_abduction \
    --p-transform-function log \
    --o-visualization $data_dir/functional_prediction/pa_ancom_abducted.qzv

[32mSaved Visualization to: ../data//functional_prediction/pa_ancom_abducted.qzv[0m
[0m

In [10]:
Visualization.load(f'{data_dir}/functional_prediction/pa_ancom_abducted.qzv')

### 2.1 Enriched KEGG orthologs visualization

We start by finding the most abundant features in each table:

In [6]:
ko = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome.qza').view(pd.DataFrame)
ec = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome.qza').view(pd.DataFrame)
pa = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance.qza').view(pd.DataFrame)

In [7]:
metadata = pd.read_csv(f'{data_dir}/metadata/str_metadata.tsv', sep='\t', header=0, index_col=0)

ko_meta = ko.merge(metadata[['alleged_abduction']], left_index=True, right_index=True)
ec_meta = ec.merge(metadata[['alleged_abduction']], left_index=True, right_index=True)
pa_meta = pa.merge(metadata[['alleged_abduction']], left_index=True, right_index=True)

ko_meta_avg = ko_meta.groupby('alleged_abduction').mean()
ec_meta_avg = ec_meta.groupby('alleged_abduction').mean()
pa_meta_avg = pa_meta.groupby('alleged_abduction').mean()

In [8]:
def find_most_abundant(df: pd.DataFrame, frac):
    if 0 < frac < 1:
        frac = int(frac * len(df.columns))
    print(f'Saving {frac} most abundant features...')
    most_abundant = {
        smp: df.loc[smp, :].sort_values(ascending=False)[:frac]
        for smp in df.index
    }
    return most_abundant

ko_most_abundant = find_most_abundant(ko_meta_avg, 0.01)
ec_most_abundant = find_most_abundant(ec_meta_avg, 0.03)
pa_most_abundant = find_most_abundant(pa_meta_avg, 5)


Saving 102 most abundant features...
Saving 86 most abundant features...
Saving 5 most abundant features...


In [7]:
print(f'10 most abundant KOs in the abducted group are: {ko_most_abundant["abducted"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-abducted group are: {ko_most_abundant["non_abducted"].index[:10].tolist()}\n')

10 most abundant KOs in the abducted group are: ['K01990', 'K06147', 'K07024', 'K01992', 'K02003', 'K02004', 'K02030', 'K02015', 'K02529', 'K01223']
10 most abundant KOs in the non-abducted group are: ['K01990', 'K01992', 'K06147', 'K02003', 'K07024', 'K02004', 'K03088', 'K02529', 'K16787', 'K02030']



In [8]:
print(f'10 most abundant ECs in the abducted group are: {ec_most_abundant["abducted"].index[:10].tolist()}\n'
      f'10 most abundant ECs in the non-abducted group are: {ec_most_abundant["non_abducted"].index[:10].tolist()}\n')

10 most abundant ECs in the abducted group are: ['EC:3.6.4.12', 'EC:2.7.7.7', 'EC:2.7.13.3', 'EC:2.7.1.69', 'EC:2.7.7.6', 'EC:3.2.1.86', 'EC:2.1.1.72', 'EC:5.2.1.8', 'EC:3.4.16.4', 'EC:5.4.2.12']
10 most abundant ECs in the non-abducted group are: ['EC:2.7.7.7', 'EC:3.6.4.12', 'EC:2.7.13.3', 'EC:2.7.1.69', 'EC:2.7.7.6', 'EC:1.97.1.4', 'EC:2.1.1.72', 'EC:3.4.16.4', 'EC:5.4.2.12', 'EC:5.2.1.8']



In [9]:
for smp in ko_most_abundant.keys():
    fetch_ipath(ko_most_abundant[smp].index.tolist(), f'../plots/functional_prediction/kos_{smp}.svg')
    fetch_ipath(ec_most_abundant[smp].index.str.replace(':', '').tolist(), f'../plots/functional_prediction/ecs_{smp}.svg')

## 3. Analyzing the Effect of Disease Type

In this section we would like to predict the functionality of microbioata of patient with different diseases across patients alleged by aliens and not alleged by patients.

We start by filtering samples that are abducted by patients.

In [7]:
! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where "[alleged_abduction]='abducted'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where  "[alleged_abduction]='abducted'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where  "[alleged_abduction]='abducted'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_abducted.qza

[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/ko_metagenome_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/ec_metagenome_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/pathway_abundance_abducted.qza[0m
[0m

and, we do the same for non-abducted patients.

In [11]:
! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where "[alleged_abduction]='non_abducted'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_not_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where  "[alleged_abduction]='non_abducted'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_not_abducted.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --p-where  "[alleged_abduction]='non_abducted'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_not_abducted.qza

[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/ko_metagenome_not_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/ec_metagenome_not_abducted.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: ../data//picrust2_results/pathway_abundance_not_abducted.qza[0m
[0m

Now, we can read in all three artifacts using QIIME 2 Python API - we can view them as DataFrames:

In [12]:
ko_a = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome_abducted.qza').view(pd.DataFrame)
ec_a = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome_abducted.qza').view(pd.DataFrame)
pa_a = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance_abducted.qza').view(pd.DataFrame)
ko_na = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome_not_abducted.qza').view(pd.DataFrame)
ec_na = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome_not_abducted.qza').view(pd.DataFrame)
pa_na = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance_not_abducted.qza').view(pd.DataFrame)

We can briefly examine the contents of each of those tables, and realize that the values differ across the abducted and non-abducted patients.

In [13]:
ko_a.head(2)

Unnamed: 0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
0DOSLC,54182.069959,6.981562,114120.68968,72354.209497,45604.268712,10.652446,79362.866673,59942.895952,33794.194597,5.324617e-06,...,25.698715,37.359132,22.442708,5.201112,19.893207,27133.128495,0.007928,0.0,37.851223,0.0
0WG67D,36949.682025,6.063987,83110.648508,56205.161333,33241.112284,9.230694,44920.981973,38218.559997,28267.465068,4.644675e-20,...,15.415347,11.101046,11.187774,2.697171,10.796936,24703.23012,0.0,0.0,17.805109,0.003554


In [14]:
ko_na.head(2)

Unnamed: 0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
0KB68F,1361.400386,22.47605,6638.807061,1283.583009,2819.624601,0.0,12651.473549,4992.0463,5519.033478,3.783984e-21,...,0.024526,0.0,0.0,0.0,609.994359,0.0,0.0,0.0,0.010882,0.00029
0TR6O2,337.235395,99.51724,1772.973885,63.402322,1388.255311,3.2e-05,1582.917987,383.045118,847.539563,0.0,...,0.0,0.0,0.0,0.0,51.894474,0.0,0.0,0.0,3.2e-05,0.0


These tables contain information about different levels of the functional profiles:

1. `ko` table: columns represent KEGG orthologs, as indicated by their names (e.g., **K**19777)
2. `ec` table: columns represent enzymes, as indicated by the Enzyme Commission numbers (e.g., **EC**:1.1.1.108)
3. `pa` table: columns represent entire pathways using the MetaCyc classification (e.g., ANAGLYCOLYSIS-PWY)

<a id='ipath'></a>
### 3.1 Enriched KEGG orthologs visualization for Disease Types

We start by merging our feature table with the treatment column (`disease`) from the metadata:

In [15]:
metadata = pd.read_csv(f'{data_dir}/metadata/str_metadata.tsv', sep='\t', header=0, index_col=0)

In [16]:
ko_a_meta = ko_a.merge(metadata[['disease']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['disease']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['disease']], left_index=True, right_index=True)
ko_na_meta = ko_na.merge(metadata[['disease']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['disease']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['disease']], left_index=True, right_index=True)

Next, we will calculate an average abundance of each KO, EC and pathway in each group (abducted vs. non abducetd):

In [17]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('disease').mean()
ec_a_meta_avg = ec_a_meta.groupby('disease').mean()
pa_a_meta_avg = pa_a_meta.groupby('disease').mean()
ko_na_meta_avg = ko_na_meta.groupby('disease').mean()
ec_na_meta_avg = ec_na_meta.groupby('disease').mean()
pa_na_meta_avg = pa_na_meta.groupby('disease').mean()

We can list the averages of KEGG orthologs for different diseases across alien abducted and not abducted patients. We observe that the values are different across patients within each disease group and across the patients with same disease with different alien abduction status.

In [18]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,34953.4687,2.55238,41150.899491,4825.747676,6031.352334,2.987571,2319.676886,1318.989544,6554.63751,1.033848e-21,...,516.795292,60.329399,242.171153,14.429723,57.305988,1022.062706,0.0,0.0,1147.702727,0.42267
Leukemia,12857.970171,87.614986,36454.100688,25317.228261,17271.679642,12.264023,26162.447626,23579.801853,13815.782371,0.0007105095,...,1459.920138,671.360458,696.323734,146.345722,601.635315,12723.246532,0.010643,3.690182e-59,1491.308203,0.000295
Myelodysplastic Syndromes,1994.688516,0.0,6361.559141,0.374649,3321.432803,0.0,19096.681963,25235.315634,8.007399,0.0,...,0.0,0.0,0.0,0.0,0.0,3297.361743,0.0,0.0,40.44593,0.0
Non-Hodgkin's Lymphoma,16468.199466,76.265046,41743.571134,4283.070794,15978.635289,0.0,53992.016472,63361.706906,1237.352134,1.9271329999999998e-21,...,2.694251,2.572491,3.047133,0.639329,2.559274,7232.517849,0.0,0.0,147.29541,0.006617


In [19]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
disease,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hodgkin's Disease,19028.235594,1065.362017,34387.09214,5062.437658,14231.673873,0.909434,41327.780462,8815.128883,16317.379247,0.002319,...,357.612024,374.245338,82.476389,357.136535,1636.803313,0.0,0.0,0.0,764.903179,0.004102
Leukemia,17227.619162,423.810503,16639.077543,2499.003063,6982.46835,21.041076,18481.732069,11486.349201,9974.081115,0.003699,...,38.012421,371.852232,8.854783,35.672199,5126.323039,0.000112,0.000287,1.960334e-65,190.707274,0.002403
Myelodysplastic Syndromes,13056.800498,1193.270974,14331.828948,360.672058,22843.442305,0.0,28629.95561,3001.358,8689.148528,0.10126,...,0.0,0.180566,0.0,0.0,220.871235,0.0,0.0,0.0,1450.629391,0.0
Non-Hodgkin's Lymphoma,11439.37848,578.621286,26721.551141,3549.133715,21085.984097,1.041146,40508.693863,26224.881159,12625.656417,0.048428,...,352.133688,354.08343,87.898479,351.837638,4058.66147,0.00013,0.0,8.615288e-62,505.411611,0.001922


In [20]:
ko_a_most_abundant = find_most_abundant(ko_a_meta_avg, 0.01)
ec_a_most_abundant = find_most_abundant(ec_a_meta_avg, 0.03)
ko_na_most_abundant = find_most_abundant(ko_na_meta_avg, 0.01)
ec_na_most_abundant = find_most_abundant(ec_na_meta_avg, 0.03)


Saving 98 most abundant features...
Saving 83 most abundant features...
Saving 102 most abundant features...
Saving 85 most abundant features...


In [28]:
ko_a_most_abundant

{"Hodgkin's Disease": K07024    358482.184181
 K01990    343190.368966
 K02030    331013.996530
 K06147    268862.436560
 K01992    259433.682089
               ...      
 K07483     59614.291643
 K00973     59170.748358
 K07707     59161.106030
 K07260     58634.222123
 K02770     56391.474971
 Name: Hodgkin's Disease, Length: 98, dtype: float64,
 'Leukemia': K01990    294841.929329
 K06147    266550.480177
 K01992    257617.909624
 K07024    241744.607213
 K02003    180959.641465
               ...      
 K02495     56078.668617
 K01448     55649.191422
 K00655     55641.665805
 K03655     55427.531542
 K08884     55413.393737
 Name: Leukemia, Length: 98, dtype: float64,
 'Myelodysplastic Syndromes': K06147    52037.914787
 K01990    51967.581732
 K01992    50096.364734
 K03088    47892.255571
 K02003    46484.951237
               ...     
 K02015    15853.662795
 K01091    15804.047777
 K03704    15552.184685
 K02775    15418.701385
 K02783    15362.848506
 Name: Myelodysplastic Sy

In [29]:
print(f'10 most abundant KOs in the abducted group are: {ko_a_most_abundant["Leukemia"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-abducted group are: {ko_a_most_abundant["Leukemia"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-abducted group are: {ko_a_most_abundant["Leukemia"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-abducted group are: {ko_a_most_abundant["Leukemia"].index[:10].tolist()}\n')

10 most abundant KOs in the abducted group are: ['K01990', 'K06147', 'K01992', 'K07024', 'K02003', 'K02004', 'K02015', 'K02030', 'K02016', 'K02529']
10 most abundant KOs in the non-abducted group are: ['K01990', 'K06147', 'K01992', 'K07024', 'K02003', 'K02004', 'K02015', 'K02030', 'K02016', 'K02529']
10 most abundant KOs in the non-abducted group are: ['K01990', 'K06147', 'K01992', 'K07024', 'K02003', 'K02004', 'K02015', 'K02030', 'K02016', 'K02529']
10 most abundant KOs in the non-abducted group are: ['K01990', 'K06147', 'K01992', 'K07024', 'K02003', 'K02004', 'K02015', 'K02030', 'K02016', 'K02529']



### 3.2 Enriched pathways

We know visualize the pathway abundances of different groups of patients across disease and alien abduction status.

In [24]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza

! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza

[32mSaved FeatureTable[Composition] to: ../data//picrust2_results/pathway_abundance_abducted_abund.qza[0m
[0m[32mSaved FeatureTable[Composition] to: ../data//picrust2_results/pathway_abundance_not_abducted_abund.qza[0m
[0m

In [26]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column disease \
    --p-transform-function log \
    --o-visualization $data_dir/functional_prediction/pa_ancom_abducted_disease.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column disease \
    --p-transform-function log \
    --o-visualization $data_dir/functional_prediction/pa_ancom_not_abducted_disease.qzv

[32mSaved Visualization to: ../data//functional_prediction/pa_ancom_abducted_disease.qzv[0m
[0m[32mSaved Visualization to: ../data//functional_prediction/pa_ancom_not_abducted_disease.qzv[0m
[0m

In [27]:
Visualization.load(f'{data_dir}/functional_prediction/pa_ancom_abducted_disease.qzv')


In [28]:
Visualization.load(f'{data_dir}/functional_prediction/pa_ancom_not_abducted_disease.qzv')

## 2. Functional Inference on Stool Consistency

In [39]:
metadata = pd.read_csv(f'{data_dir}/metadata/str_metadata.tsv', sep='\t', header=0, index_col=0)

In [41]:
ko_a_meta = ko_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ko_na_meta = ko_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['stool_consistency']], left_index=True, right_index=True)


In [42]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('stool_consistency').mean()
ec_a_meta_avg = ec_a_meta.groupby('stool_consistency').mean()
pa_a_meta_avg = pa_a_meta.groupby('stool_consistency').mean()
ko_na_meta_avg = ko_na_meta.groupby('stool_consistency').mean()
ec_na_meta_avg = ec_na_meta.groupby('stool_consistency').mean()
pa_na_meta_avg = pa_na_meta.groupby('stool_consistency').mean()

In [43]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
stool_consistency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
formed,6037.30679,773.65638,20189.709305,2051.262596,11894.709954,0.817587,29769.856206,16085.548045,12866.066883,0.040517,...,343.760265,346.83513,84.681018,343.471666,2282.388267,0.0,0.0,0.0,513.854295,0.000347
liquid,18708.284957,529.118973,28660.095238,2850.176466,21045.077804,17.110777,43873.721815,24820.053589,13028.297977,0.020591,...,15.9071,561.045843,3.651652,15.008893,2734.697251,0.0003854027,0.000478,3.267224e-65,341.382649,0.003919
semi-formed,27573.87634,190.813369,19447.748558,5223.539693,10406.258055,19.993414,15489.318023,10327.139551,8483.660044,5.8e-05,...,70.334807,85.938083,16.743543,66.983448,9193.116584,7.763663e-132,0.0,1.435881e-61,273.870191,0.003988


In [44]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
stool_consistency,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
formed,10147.00615,38.277427,28631.098128,15020.481029,18270.32595,24.834485,22120.136212,22564.353026,14162.836028,1.333052e-21,...,3313.347149,1577.393486,1586.251307,343.85876,1414.225607,9928.389741,0.0,0.0,3350.177255,0.000102
liquid,17736.375188,93.757042,50932.307112,27855.87567,18077.871448,1.568945,49075.026567,48127.915567,9740.345325,0.001499965,...,130.917774,10.554224,50.823766,2.131272,8.428592,15485.064326,0.022468,7.790385e-59,243.304554,0.003744
semi-formed,13643.017988,99.836616,29046.519817,15155.787109,12060.491574,2.578626,18869.624145,19745.890068,8253.719387,6.1928999999999996e-21,...,64.856889,12.767183,37.792601,3.127061,12.390128,7047.888993,0.0,0.0,140.924726,0.047428


In [45]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column stool_consistency \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted_stool_consistency.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column stool_consistency \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_not_abducted_stool_consistency.qzv

[32mSaved Visualization to: shared_data//pa_ancom_abducted_stool_consistency.qzv[0m
[0m[32mSaved Visualization to: shared_data//pa_ancom_not_abducted_stool_consistency.qzv[0m
[0m

In [46]:
Visualization.load(f'{data_dir}/pa_ancom_abducted_stool_consistency.qzv')

In [47]:
Visualization.load(f'{data_dir}/pa_ancom_not_abducted_stool_consistency.qzv')

## 3. Functional Inference on Categorical categorical Time Relative to Engraftment


In [48]:
ko_a_meta = ko_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ec_a_meta = ec_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
pa_a_meta = pa_a.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ko_na_meta = ko_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
ec_na_meta = ec_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)
pa_na_meta = pa_na.merge(metadata[['categorical_time_relative_to_engraftment']], left_index=True, right_index=True)

In [49]:
# collapse samples per sample_type - calculate average abundance

ko_a_meta_avg = ko_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
ec_a_meta_avg = ec_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
pa_a_meta_avg = pa_a_meta.groupby('categorical_time_relative_to_engraftment').mean()
ko_na_meta_avg = ko_na_meta.groupby('categorical_time_relative_to_engraftment').mean()
ec_na_meta_avg = ec_na_meta.groupby('categorical_time_relative_to_engraftment').mean()
pa_na_meta_avg = pa_na_meta.groupby('categorical_time_relative_to_engraftment').mean()

In [50]:
ko_a_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
categorical_time_relative_to_engraftment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
peri,2114.150651,335.571804,6704.060157,648.476811,3622.080928,1.320137,12160.84125,7810.594355,5321.334654,1.1e-05,...,31.153772,46.877726,7.337195,29.352181,932.024107,0.0,0.0,0.0,81.919162,9.7e-05
post,38891.56807,1018.57399,48330.342573,7486.202129,32076.200421,29.300452,62746.992648,40142.072759,22515.633123,0.077798,...,556.583222,1015.152726,136.602004,554.0172,11353.594292,0.000323,0.0004,1.102229e-61,1114.385819,0.006685
pre,3349.070585,360.71458,10606.952331,1018.165276,5988.067033,0.59576,14760.258998,5155.612975,6561.072315,0.00112,...,14.079083,17.604554,3.483824,13.948624,614.715979,0.0,0.0,0.0,78.476669,8.2e-05


In [51]:
ko_na_meta_avg

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
categorical_time_relative_to_engraftment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
peri,1071.346104,152.46512,8949.881683,2133.061097,4159.275767,0.0,20253.321044,9911.320896,7723.821127,1.03215e-21,...,2.666667,2.673357,2.666667,0.665968,2.66591,1445.448955,0.0,0.0,2.673435,7.9e-05
post,20671.457512,71.474948,50965.499933,28947.815703,21931.421817,12.972971,37881.680954,41307.699574,12875.523434,0.0007940989,...,1661.409336,754.177675,792.338813,164.482187,676.069153,15965.293801,0.011895,4.124322e-59,1776.659257,0.027125
pre,1493.628627,62.682144,9265.847245,1459.460499,5349.312766,2.577253,13952.401125,9902.369913,5532.318145,0.0,...,2.798408,0.0,1.632991,0.0,0.0,1079.559387,0.0,0.0,8.042315,0.0


In [52]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column categorical_time_relative_to_engraftment \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abducted_categorical_time.qzv

! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_not_abducted_abund.qza \
    --m-metadata-file $data_dir/metadata/str_nan_metadata.tsv \
    --m-metadata-column categorical_time_relative_to_engraftment \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_not_abducted_categorical_time.qzv

[32mSaved Visualization to: shared_data//pa_ancom_abducted_categorical_time.qzv[0m
[0m[32mSaved Visualization to: shared_data//pa_ancom_not_abducted_categorical_time.qzv[0m
[0m

In [53]:
Visualization.load(f'{data_dir}/pa_ancom_abducted_categorical_time.qzv')

In [54]:
Visualization.load(f'{data_dir}/pa_ancom_not_abducted_categorical_time.qzv')