### Metagenomics

In [1]:
import os
import pandas as pd
import qiime2 as q2
import requests

from qiime2 import Visualization

data_dir = 'CE'
    
%matplotlib inline

In [2]:
def fetch_ipath(ids: list, img_output_path: str, verbose: bool = False):
    """Fetches a enriched pathways map from iPATH3 for given IDs."""
    url = 'https://pathways.embl.de/mapping.cgi'
    
    # remove colon from EC names
    if ':' in ids[0]:
        ids = [x.replace(':', '') for x in ids]
    
    if verbose:
        print(f'Fetching iPATH3 diagram for ids: {ids}')
    params = {
        'default_opacity': 0.6,
        'export_type': 'svg',
        'selection': '\n'.join(ids)
    }   
    response = requests.get(url=url, params=params)
    
    with open(img_output_path, 'wb') as img:
        img.write(response.content)

In [3]:
picrust_env = '/opt/conda/envs/picrust2/bin'

#### Functional inference

In [4]:
%%script env picrust_env="$picrust_env" data_dir="$data_dir" bash

# append the env location to PATH so that qiime
# can find all required executables
export PATH=$picrust_env:$PATH

$picrust_env/qiime picrust2 full-pipeline \
    --i-seq $data_dir/dada2_rep_set_filtered.qza \
    --i-table $data_dir/dada2_table_align_filtered.qza \
    --output-dir $data_dir/picrust2_results \
    --p-placement-tool sepp \
    --p-threads 2 \
    --p-hsp-method pic \
    --p-max-nsti 2 

QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.


Saved FeatureTable[Frequency] to: CE/picrust2_results/ko_metagenome.qza
Saved FeatureTable[Frequency] to: CE/picrust2_results/ec_metagenome.qza
Saved FeatureTable[Frequency] to: CE/picrust2_results/pathway_abundance.qza


##### Download files

In [5]:
! wget -nv -O $data_dir/picrust2_results/metagenomics.zip 'https://polybox.ethz.ch/index.php/s/9IoT5okOckQUCl5/download'

2022-11-23 10:40:45 URL:https://polybox.ethz.ch/index.php/s/9IoT5okOckQUCl5/download [47323823] -> "CE/picrust2_results/metagenomics.zip" [1]


In [6]:
! unzip -q $data_dir/picrust2_results/metagenomics.zip -d $data_dir
! rm $data_dir/picrust2_results/metagenomics.zip

##### visualize metadata

In [9]:
metadata = pd.read_csv(f'{data_dir}/food-metadata.tsv', sep='\t', header=0, index_col=0)

In [3]:
! qiime metadata tabulate \
    --m-input-file $data_dir/food-metadata.tsv \
    --o-visualization $data_dir/food-metadata.qzv

[32mSaved Visualization to: CE/food-metadata.qzv[0m
[0m

In [4]:
Visualization.load(f'{data_dir}/food-metadata.qzv')

In [None]:
#han mal no nüt gfiltered (nur copy pasted)
#! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ko_metagenome.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/ko_metagenome_child.qza

#! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/ec_metagenome.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/ec_metagenome_child.qza
#! qiime feature-table filter-samples \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --m-metadata-file $data_dir/metadata.tsv \
    --p-where "[mom_or_child]='C'" \
    --o-filtered-table $data_dir/picrust2_results/pathway_abundance_child.qza

In [7]:
#let's look at unfiltered picrust2_results

ko = q2.Artifact.load(f'{data_dir}/picrust2_results/ko_metagenome.qza').view(pd.DataFrame)
ec = q2.Artifact.load(f'{data_dir}/picrust2_results/ec_metagenome.qza').view(pd.DataFrame)
pa = q2.Artifact.load(f'{data_dir}/picrust2_results/pathway_abundance.qza').view(pd.DataFrame)


let's have a look at these tables. They don't contain any information about ASVs anymore but about different levels of the functional profiles:

1. `ko` table: columns represent KEGG orthologs, as indicated by their names (e.g., **K**19777)
2. `ec` table: columns represent enzymes, as indicated by the Enzyme Commission numbers (e.g., **EC**:1.1.1.108)
3. `pa` table: columns represent entire pathways using the MetaCyc classification (e.g., ANAGLYCOLYSIS-PWY)

let's merge feature table with the pasteurization column from the metadata.

In [10]:
ko_meta = ko.merge(metadata[['pasteurized']], left_index=True, right_index=True)
ec_meta = ec.merge(metadata[['pasteurized']], left_index=True, right_index=True)
pa_meta = pa.merge(metadata[['pasteurized']], left_index=True, right_index=True)

In [11]:
ko_meta_avg = ko_meta.groupby('pasteurized').mean()
ec_meta_avg = ec_meta.groupby('pasteurized').mean()
pa_meta_avg = pa_meta.groupby('pasteurized').mean()

In [12]:
ko_meta_avg.head()

Unnamed: 0_level_0,K00001,K00002,K00003,K00004,K00005,K00007,K00008,K00009,K00010,K00011,...,K19776,K19777,K19778,K19779,K19780,K19784,K19785,K19788,K19789,K19791
pasteurized,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
N,45212.003827,185.908689,37531.219524,14012.060133,5540.705731,39.233664,26725.283658,8666.802792,4928.60307,4e-06,...,206.156283,198.177837,221.580649,1.507643e-38,8.650003,3140.075768,4.876630000000001e-125,9.019276e-55,2227.031159,0.520927
Y,48813.259252,322.07122,49677.769682,21221.676186,10779.027223,237.477202,36895.098419,16783.979802,18479.601949,0.000676,...,431.210745,204.207785,236.545723,8.246361999999999e-38,12.900228,10312.811629,6.06288e-125,1.1213230000000001e-54,6174.098661,10.168494


In [13]:
# find top x% of the most abundant KOs, ECs and pathways in each sample type

def find_most_abundant(df: pd.DataFrame, frac):
    if 0 < frac < 1:
        frac = int(frac * len(df.columns))
    print(f'Saving {frac} most abundant features...')
    most_abundant = {
        smp: df.loc[smp, :].sort_values(ascending=False)[:frac]
        for smp in df.index
    }
    return most_abundant

ko_most_abundant = find_most_abundant(ko_meta_avg, 0.01)
ec_most_abundant = find_most_abundant(ec_meta_avg, 0.03)
pa_most_abundant = find_most_abundant(pa_meta_avg, 5)

Saving 101 most abundant features...
Saving 84 most abundant features...
Saving 5 most abundant features...


In [15]:
print(f'10 most abundant KOs in the treatment group are: {ko_most_abundant["Y"].index[:10].tolist()}\n'
      f'10 most abundant KOs in the non-treatment group are: {ko_most_abundant["N"].index[:10].tolist()}\n')

10 most abundant KOs in the treatment group are: ['K01990', 'K02015', 'K01992', 'K00059', 'K03088', 'K02529', 'K02016', 'K02013', 'K00626', 'K07090']
10 most abundant KOs in the non-treatment group are: ['K01990', 'K01992', 'K02015', 'K03088', 'K00059', 'K02529', 'K00626', 'K02016', 'K02013', 'K00666']



In [18]:
for smp in ko_most_abundant.keys():
    fetch_ipath(ko_most_abundant[smp].index.tolist(), f'{data_dir}/kos_{smp}.svg')
    fetch_ipath(ec_most_abundant[smp].index.str.replace(':', '').tolist(), f'{data_dir}/ecs_{smp}.svg')

Don't see anything too interesting...
let's continue with enriched pathways

In [8]:
! qiime composition add-pseudocount \
    --i-table $data_dir/picrust2_results/pathway_abundance.qza \
    --o-composition-table $data_dir/picrust2_results/pathway_abundance_differences.qza

[32mSaved FeatureTable[Composition] to: CE/picrust2_results/pathway_abundance_differences.qza[0m
[0m

In [9]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_differences.qza \
    --m-metadata-file $data_dir/food-metadata.tsv \
    --m-metadata-column pasteurized \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_pasteurized.qzv

[32mSaved Visualization to: CE/pa_ancom_pasteurized.qzv[0m
[0m

In [4]:
Visualization.load(f'{data_dir}/pa_ancom_pasteurized.qzv')

three significant different pathways were found for pasteurized vs non pasteurized: 
GLUCOSE1PMETAB-PWY	465
PWY0-1533	420
PWY-6397	420

In [5]:
! qiime composition ancom \
    --i-table $data_dir/picrust2_results/pathway_abundance_differences.qza \
    --m-metadata-file $data_dir/food-metadata.tsv \
    --m-metadata-column rindtype \
    --p-transform-function log \
    --o-visualization $data_dir/pa_ancom_rindtype.qzv

[32mSaved Visualization to: CE/pa_ancom_rindtype.qzv[0m
[0m

In [2]:
Visualization.load(f'{data_dir}/pa_ancom_rindtype.qzv')

a lot of different pathways but ugly volcanoplot... it starts bottom left and goes to right up corner...