# Metagenomics

This Jupyternotebook is focusing on the analysis of metagenomics datasets. 
Metagenomes allow us to get a glimpse into not only composition but also functional potential of microbial communities.

We will use the PICRUST2 plugin to *infer metagenomic data* by mapping 16S rRNA gene sequences to their nearest matching whole genome sequences. For this, we will use the project datasets with the 16S rRNA genes. We will then use this data to gain insights into functional potential of our community. PICRUST2 will generate for us feature tables containing abundance information about KEGG orthologs, enzymes and entire pathways which we can later compare between samples and conditions.

**Notebook overview:**

[0. Setup](#setup)<br>
[1. Functional Inference](#picrust)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.1 Enriched KEGG orthologs](#ipath)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[1.2 Enriched pathways](#metacyc)<br>


Reference: PICRUST2 on its [GitHub wiki](https://github.com/picrust/picrust2/wiki), [this tutorial](https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial) and the [Nature Biotechnology article](https://doi.org/10.1038/s41587-020-0548-6).

<a id='setup'></a>
## 0. Setup

In [1]:
import os
import pandas as pd
import qiime2 as q2
import requests

from qiime2 import Visualization

data_dir = 'project_data'
    
%matplotlib inline

In [2]:
def fetch_ipath(ids: list, img_output_path: str, verbose: bool = False):
    """Fetches a enriched pathways map from iPATH3 for given IDs."""
    url = 'https://pathways.embl.de/mapping.cgi'
    
    # remove colon from EC names
    if ':' in ids[0]:
        ids = [x.replace(':', '') for x in ids]
    
    if verbose:
        print(f'Fetching iPATH3 diagram for ids: {ids}')
    params = {
        'default_opacity': 0.6,
        'export_type': 'svg',
        'selection': '\n'.join(ids)
    }   
    response = requests.get(url=url, params=params)
    
    with open(img_output_path, 'wb') as img:
        img.write(response.content)

In [3]:
# path to the picrust2 conda environment - do not change!
picrust_env = '/opt/conda/envs/picrust2/bin'

Download the `FeatureData[Sequence]` from our data which was made in the FirstLook.ipynb:

In [9]:
! wget -nv -O $data_dir/rep-seqs.qza 'https://polybox.ethz.ch/index.php/s/MBLSUQXzglnn66u/download?path=%2F&files=Sequences_rep_set.qza'

2022-11-23 10:38:25 URL:https://polybox.ethz.ch/index.php/s/MBLSUQXzglnn66u/download?path=%2F&files=Sequences_rep_set.qza [390624/390624] -> "project_data/rep-seqs.qza" [1]


Download the `FeatureTable[Frequency]` containing a mapping of the dereplicated sequences to samples from our data which was made in the FirstLook.ipynb:

In [10]:
! qiime tools peek $data_dir/rep-seqs.qza

[32mUUID[0m:        fd06ce7d-7b2d-4485-afda-fa50da61e9f4
[32mType[0m:        FeatureData[Sequence]
[32mData format[0m: DNASequencesDirectoryFormat


In [None]:
 ! wget -nv -O $data_dir/table.qza 'https://polybox.ethz.ch/index.php/s/MBLSUQXzglnn66u/download?path=%2F&files=Feature_table.qza'

Download the `Metadata` containing metadata to samples from our data:

In [None]:
! wget -nv -O $data_dir/metadata.tsv 'https://polybox.ethz.ch/index.php/s/QqbHeUIpIR0okB8/download'

<a id='picrust'></a>
## 1. Functional inference

As mentioned in the introduction, we are using Picrust 2 to simulate metagenome data from our 16S dataset. Execute the command below to run picrust2 - it will take approximately 30-40 minutes.

In [13]:
%%script env picrust_env="$picrust_env" data_dir="$data_dir" bash

# append the env location to PATH so that qiime
# can find all required executables
export PATH=$picrust_env:$PATH
    
echo $PATH
echo $picrust_env
echo $data_dir

$picrust_env/qiime picrust2 full-pipeline \
    --i-seq $data_dir/rep-seqs.qza \
    --i-table $data_dir/table.qza \
    --output-dir $data_dir/picrust2_results \
    --p-placement-tool sepp \
    --p-threads 2 \
    --p-hsp-method pic \
    --p-max-nsti 2 --verbose

/opt/conda/envs/picrust2/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/opt/conda/envs/picrust2/bin
project_data
Saved FeatureTable[Frequency] to: project_data/picrust2_results/ko_metagenome.qza
Saved FeatureTable[Frequency] to: project_data/picrust2_results/ec_metagenome.qza
Saved FeatureTable[Frequency] to: project_data/picrust2_results/pathway_abundance.qza



This is the set of poorly aligned input sequences to be excluded: 13f72db00c0dbaa8958b9f7918f0b374, 07beb5cc8e914bcdb53a22bb47519c6f, 000fa3dd1e0addc3fe1e3b5d1008afe8, 94346c6d9a95d3b3ae8cc5199de67117, 879643dd982eb501f3c81fca990fd518, acb8def29a97a1c91b3fc3a21f2caf2b, b957bbc0b2cf69471ecf66e8ad3e04bc, aff251a2cb898bb774dbdb48905421cb, fbb21b0edc5b0ef81226200b39c869f6, 70e4caa9982037a944a5d98bf98d55f8, 9ce8bef85c4979f8b7bc3cac7ed448c3, b4eb3cde9bea75d4b44aa58ca81582c0, cd4a519b47d3968e0691867615a5c176, 50494e8139cc7865daf756431a9ddbc4, 6b3d79b96bbe4081267cbf253f6acf4a, 2d156bf28e5958f61156702359c1bbc9, d59db3377c485ff138fab36dfff0829a, cd0b6534edf1e39f840043cc9ff16245, b53340e67fe74981e67adddc676e0a70, aaf3da224d70d78e682e45aee3ffdd11, ea8eb3076951ea382ced8d12e64a09b8, b4af13714047999d9c1f412ef2ee40fc, 7c22b912d937a2a56abf575d9a74a926, 800ee92d16f9ed811de6cdcbc6743a02, 1cc76c8348170e893ed906f3887a8e85, a09ceaed5bdfdef2d6ee2b723513e355, 5dcdf86dcd2943dc8f27273f780dd0bf, e892639661f6691

In the `picrust2_results` subdirectory we get 3 artifacts - all of them are of the `FeatureTable[Frequency]` semantic type and hold abundance information for various functional features:

1. KO metagenome - abundances of [KEGG](https://www.kegg.jp) orthologs
2. EC metagenome - abundance of enzymes (you could look some of them up [here](https://www.brenda-enzymes.org)
3. MetaCyc pathways - abundance of [MetaCyc](https://metacyc.org) pathways

These are essentially three different summaries of the same data: Picrust2 maps your amplicon sequences to a full genome reference database to infer metagenome composition of a community, and then summarizes this information according to the estimated abundances of (1) the orthologous genes, (2) the enzymes that they encode, and (3) the metabolic pathways that they belong to.


In [5]:
metadata = pd.read_csv(f'{data_dir}/metadata.tsv', sep='\t', header=0, index_col=0)

Let's look at the metadata briefly to see what kind of categories we have available:

In [6]:
! qiime metadata tabulate \
    --m-input-file $data_dir/metadata.tsv \
    --o-visualization $data_dir/metadata.qzv

[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.[0m
[32mSaved Visualization to: w9_data/metadata.qzv[0m
[0m

In [7]:
Visualization.load(f'{data_dir}/metadata.qzv')

This dataset contains samples from both, drinking and non drinking people. Let's first separate the samples based on the metadata column `NUT_alcohol_consumption` to look only at the samples from drinking:


In [8]:
! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_child.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_child.qza

! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_child.qza

[32mSaved FeatureTable[Frequency] to: w9_data/picrust2_results/ko_metagenome_child.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: w9_data/picrust2_results/ec_metagenome_child.qza[0m
[0m[32mSaved FeatureTable[Frequency] to: w9_data/picrust2_results/pathway_abundance_child.qza[0m
[0m

Now, we can read in all three artifacts using QIIME 2 Python API - we can view them as DataFrames:

In [9]:
ko = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome_child.qza').view(pd.DataFrame)
ec = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome_child.qza').view(pd.DataFrame)
pa = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance_child.qza').view(pd.DataFrame)

Let's briefly examine the contents of each of those tables:

In [10]:
ko.head()

Unnamed: 0,K00001,K00002,K00003,K00004,K00005,K00006,K00007,K00008,K00009,K00010,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
12802.10249.C007.08SS,5567.954253,5.019038e-12,2325.866198,38.871277,1285.710309,0.0,0.006192,2528.428697,537.601025,5883.369401,...,140.49,141.997581,1.281908,1.342215,179.610288,1.690535e-126,0.0,3.126628e-56,628.585652,0.0
12802.10249.C047.08SD,11520.033699,0.0420454,6935.286082,545.455569,5381.682927,0.0,31.394071,13172.555257,1087.762298,18278.463532,...,709.74,751.626786,6.476058,6.780721,1015.751728,2.4668220000000004e-125,0.0,4.562361e-55,882.029769,0.011715
12802.10249.C016.09SS,3728.634346,0.05609497,1619.58097,50.052374,1339.718791,0.0,0.000525,1541.558854,318.00426,2927.011903,...,167.26,225.001876,1.526172,1.59797,235.071529,0.0,0.0,0.0,647.627117,0.015629
12802.10249.C023.01SS.r,14171.783597,0.005405804,14223.852495,164.773784,20200.554172,0.0,0.198871,15072.763747,11257.782485,10251.098376,...,2.099986e-11,6891.132835,1.0898219999999999e-38,1.099425e-38,7056.813543,5.845949000000001e-125,0.0,1.0812020000000001e-54,7056.064955,0.0
12802.10249.C016.02SS,8978.181164,0.2592972,385.169375,76.058317,2169.988374,0.0,1e-06,115.751311,1841.813426,10554.755125,...,0.43,0.43,0.003923556,0.004108138,1883.914364,0.0,0.0,0.0,10582.089272,0.072246


In [11]:
ec.head()

Unnamed: 0,EC:1.1.1.1,EC:1.1.1.10,EC:1.1.1.100,EC:1.1.1.101,EC:1.1.1.102,EC:1.1.1.103,EC:1.1.1.105,EC:1.1.1.107,EC:1.1.1.108,EC:1.1.1.11,...,EC:6.4.1.8,EC:6.5.1.1,EC:6.5.1.2,EC:6.5.1.3,EC:6.5.1.4,EC:6.5.1.5,EC:6.5.1.6,EC:6.5.1.7,EC:6.6.1.1,EC:6.6.1.2
12802.10249.C007.08SS,7897.840792,3.369554e-14,12487.424022,3.6940449999999996e-21,3.8248320000000006e-17,832.898971,7.185326e-58,4.973201e-42,45.125923,0.006192,...,2.454161e-126,2959.472997,5715.377609,735.89188,140.49,1.744875e-55,592.231351,592.231351,2076.934692,4708.662793
12802.10249.C047.08SD,24626.037961,2.849515e-12,46312.065076,4.425015e-19,14.34333,5925.578564,0.003295164,7.256871999999999e-41,587.540343,31.394071,...,3.5811000000000004e-125,17573.928571,8968.238204,6995.54268,709.74,2.5461130000000002e-54,3525.777322,3525.777322,5091.519706,4469.440962
12802.10249.C016.09SS,5569.431056,0.0,11419.637939,4.0576989999999996e-20,0.0472373,733.505382,0.004396251,0.01575014,3.453316,0.000525,...,0.8444625,3414.987321,3564.27549,2095.162792,167.26,0.0,684.012633,684.012633,560.375826,3294.987828
12802.10249.C023.01SS.r,59075.224937,0.0046267,41865.923472,3.053311e-59,1.322644e-15,7066.340955,2.484719e-56,1.719756e-40,125.001845,0.198871,...,8.4866e-125,112.160515,34340.856556,20.812133,3.450117e-14,6.033856e-54,50.709452,50.709452,489.074758,254.299332
12802.10249.C016.02SS,15701.889796,0.0,39636.042706,0.0,0.2183529,1869.248328,0.02032153,0.0,0.574041,1e-06,...,0.0,8890.334914,12857.429727,17296.995562,0.43,0.0,89.358157,89.358157,161.8298,8875.898259


In [12]:
pa.head()

Unnamed: 0,1CMET2-PWY,3-HYDROXYPHENYLACETATE-DEGRADATION-PWY,AEROBACTINSYN-PWY,ALL-CHORISMATE-PWY,ANAEROFRUCAT-PWY,ANAGLYCOLYSIS-PWY,ARG+POLYAMINE-SYN,ARGDEG-PWY,ARGORNPROST-PWY,ARGSYN-PWY,...,THISYN-PWY,THREOCAT-PWY,THRESYN-PWY,TRNA-CHARGING-PWY,TRPSYN-PWY,TYRFUMCAT-PWY,UBISYN-PWY,UDPNAGSYN-PWY,VALDEG-PWY,VALSYN-PWY
12802.10249.C007.08SS,5721.150714,40.164924,176.453713,892.067646,5803.184321,8384.533258,1901.871623,170.668432,1459.089747,3832.430825,...,6429.985689,763.49697,7196.8122,5909.85008,5528.434885,655.462302,347.501653,3012.045969,0.0,7371.406577
12802.10249.C047.08SD,9745.572172,137.653716,857.836365,3071.566112,8917.89003,10673.304833,3784.762421,805.714267,6379.027453,9081.699754,...,5059.626597,2098.323229,8708.013402,8771.517526,6075.568869,2191.585047,1567.54451,8788.977067,0.0,15823.803428
12802.10249.C016.09SS,3796.085925,37.890835,173.726538,799.985385,3861.503242,4267.790142,1605.22715,168.404396,1249.266206,2913.491399,...,3940.863509,730.602581,4423.751005,3789.852193,3625.042844,638.681416,378.810697,2298.307895,0.0,4896.382036
12802.10249.C023.01SS.r,16983.970848,9284.27598,124.761968,11415.868042,21557.630448,26461.27088,19870.971022,704.733374,9505.241387,20467.263931,...,20604.338674,12114.138031,22024.507339,23399.016395,14859.738689,84.035307,8529.040805,22072.739664,2.5e-05,20703.022534
12802.10249.C016.02SS,11185.554451,2038.145019,1854.101932,4469.036048,7981.193603,14192.080678,3448.809531,2358.772474,817.240354,7305.461791,...,13458.819467,629.375362,13840.200805,11476.898863,12173.033233,87.966574,2856.684365,3267.442875,0.0,15184.583776


You can see that they look just like the other feature tables we worked before with. The difference is that now they do not contain information about ASVs but about different levels of the functional profiles:

1. `ko` table: columns represent KEGG orthologs, as indicated by their names (e.g., **K**19777)
2. `ec` table: columns represent enzymes, as indicated by the Enzyme Commission numbers (e.g., **EC**:1.1.1.108)
3. `pa` table: columns represent entire pathways using the MetaCyc classification (e.g., ANAGLYCOLYSIS-PWY)

<a id='ipath'></a>
### 1.1 Enriched KEGG orthologs visualization

Unfortunately, at this point QIIME 2 does not yet support visualizing enriched KO terms and pathways. We can, however, use other tools to visualize those. Let's try to find the most abundant KEGG orthologs and plot them for both, samples where antibiotic treatment was applied and not.

We start by merging our feature table with the treatment column (`abx_exposure`) from the metadata:

In [13]:
ko_meta = ko.merge(metadata[['abx_exposure']], left_index=True, right_index=True)
ec_meta = ec.merge(metadata[['abx_exposure']], left_index=True, right_index=True)
pa_meta = pa.merge(metadata[['abx_exposure']], left_index=True, right_index=True)

Next, we will calculate an average abundance of each KO, EC and pathway in each group (treatment vs. no treatment):

In [14]:
# collapse samples per sample_type - calculate average abundance

ko_meta_avg = ko_meta.groupby('abx_exposure').mean()
ec_meta_avg = ec_meta.groupby('abx_exposure').mean()
pa_meta_avg = pa_meta.groupby('abx_exposure').mean()

In [15]:
ko_meta_avg.head()

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00006,K00007,K00008,K00009,K00010,...,K19777,K19778,K19779,K19780,K19784,K19785,K19787,K19788,K19789,K19791
abx_exposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
no,5996.952649,64.834454,5597.43127,485.426796,4328.489754,0.084168,78.447254,6670.759135,1663.397132,7293.26425,...,811.272558,824.909331,7.015205,22.286485,1251.160205,0.108677,3.430599e-07,0.073142,1742.536652,0.17058
yes,5468.855563,22.516847,5780.651032,529.552847,4506.730587,0.091583,111.463141,5782.298698,1706.715201,6230.003595,...,630.13995,722.157928,4.416278,171.639755,1049.364526,0.08124,9.190191e-08,0.011234,1739.239233,0.013455


Finally, let's find the most abundant features in each table:

In [16]:
# find top x% of the most abundant KOs, ECs and pathways in each sample type

def find_most_abundant(df: pd.DataFrame, frac):
    if 0 < frac < 1:
        frac = int(frac * len(df.columns))
    print(f'Saving {frac} most abundant features...')
    most_abundant = {
        smp: df.loc[smp, :].sort_values(ascending=False)[:frac]
        for smp in df.index
    }
    return most_abundant

ko_most_abundant = find_most_abundant(ko_meta_avg, 0.01)
ec_most_abundant = find_most_abundant(ec_meta_avg, 0.03)
pa_most_abundant = find_most_abundant(pa_meta_avg, 5)

Saving 103 most abundant features...
Saving 86 most abundant features...
Saving 5 most abundant features...


In [17]:
print(f'10 most abundant KOs in the treatment group are: {ko_most_abundant["yes"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-treatment group are: {ko_most_abundant["no"].index[:10].tolist()}\n')

10 most abundant KOs in the treatment group are: ['K03088', 'K02004', 'K01992', 'K01990', 'K06147', 'K02003', 'K02529', 'K02015', 'K07024', 'K02016']
10 most abundant KOs in the non-treatment group are: ['K02004', 'K03088', 'K01992', 'K01990', 'K06147', 'K02003', 'K02529', 'K02015', 'K07024', 'K05349']



To visualize the KOs and ECs we can use the [Interactive Pathway Explorer](https://pathways.embl.de) (iPath 3). First, we will fetch pathway maps for KOs and ECs per sample group and display them as SVGs. Then, you can try copying some IDs and trying out the interactive pathway map available on the web page linked above - it will allow you to zoom into different areas of the map and look at some more interesting details.

In [18]:
for smp in ko_most_abundant.keys():
    fetch_ipath(ko_most_abundant[smp].index.tolist(), f'{data_dir}/kos_{smp}.svg')
    fetch_ipath(ec_most_abundant[smp].index.str.replace(':', '').tolist(), f'{data_dir}/ecs_{smp}.svg')

This should have fetched 4 pathway maps: 1 map per sample group (treated vs. non-treated) and 1 map per feature table (KO vs. EC). The are rather large images and so displaying them inline here would make the maps unreadable. Instead, locate those in the folder containing this week's data (`w9_data`) and open them externally.

#### 1.1.1 Checkpoint A
Qualitatively speaking, do the maps look the same between treated and non-treated samples? Can you identify some enzymes that differ between the two? (you may need to look at the interactive map to find those)

**Answer:**
ec_yes contains red lines in the area of Glycerolipid metabolism/Glycerophospholipid metabolism for the enzyme 1-acylglycerol-3-phosphate O-acyltransferase, Phenylalanine/ tyrosine and tryptophan biosynthesis for the enzyme 3-deoxy-7-phosphoheptulonate synthase or Cysteine and methionine metabolism for the enzyme DNA (cytosine-5-)-methyltransferase where as they can not be found in the pathwaymap of ec_no (non-treated). 

Comparing the kos_yes and kos_no, there is a again a read line for the Glycerolipid metabolism/Glycerophospholipid metabolism for the enzyme 1-acylglycerol-3-phosphate O-acyltransferase in the treated but not in the non-treated. The otherway around is it for the Pentose and glucuronate interconversions for the enzyme xylulokinase which is found in the non-treated but not in the treated one. 

<a id='metacyc'></a>
### 1.2 Enriched pathways

To get a bigger picture, we can also look at the most abundant pathways in both sample groups. Use one of the functions defined above to identify the four pathways that are most abundant in the treated and non-treated samples.

In [19]:
print(f'4 most abundant pathways in the treatment group are: {pa_most_abundant["yes"].index[:4].tolist()}\n'
      f'4 most abundant pathways in the non-treatment group are: {pa_most_abundant["no"].index[:4].tolist()}\n')

4 most abundant pathways in the treatment group are: ['PWY-7111', 'NONOXIPENT-PWY', 'PWY-5101', 'ILEUSYN-PWY']
4 most abundant pathways in the non-treatment group are: ['PWY-7111', 'NONOXIPENT-PWY', 'PWY-5101', 'VALSYN-PWY']



Last week we learned how to identify differentially abundant features between samples. Let's use this approach here to find whether there are pathways that differ significantly between samples with and without antibiotic treatment.

In [20]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance_child.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_child_abund.qza

[32mSaved FeatureTable[Composition] to: w9_data/picrust2_results/pathway_abundance_child_abund.qza[0m
[0m

In [21]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_child_abund.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column abx_exposure \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_abx.qzv

[32mSaved Visualization to: w9_data/pa_ancom_abx.qzv[0m
[0m

In [22]:
Visualization.load(f'{data_dir}/pa_ancom_abx.qzv')

Too bad! No significant differences :(

Now repeat these steps but compare mothers and children. We will do the pseudocount addition for you, but you do the rest.

In [25]:
# note: we are going back to the original pathway abundance table, 
# which contains both mothers and children. We used the filtered table above.
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance.qza

[32mSaved FeatureTable[Composition] to: w9_data/picrust2_results/pathway_abundance.qza[0m
[0m

In [28]:
# Your turn! fill in the blanks to run ancom and visualize the results.
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --m-metadata-column mom_or_child \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_mom_or_child.qzv


[32mSaved Visualization to: w9_data/pa_ancom_mom_or_child.qzv[0m
[0m

In [29]:
Visualization.load(f'{data_dir}/pa_ancom_mom_or_child.qzv')

#### 1.2.1 Checkpoint B

1. Were any inferred pathways significantly differentially abundant? If so, which pathway had the highest W value? Was this pathway more abundant in mothers or children (in the ancom visualization, scroll down to the table `Percentile abundances of features by group`)

**Answer:** There are many pathways significanlty differntially abundant in mother and child. The highest W value of 488 was found in the VALDEG-PWY Pathway it was more abundant in the mother. 