In [1]:
import unittest
import numpy as np
import pandas as pd

Below, we read the metadata about mice fecal samples collected during experimental implementation of the study:

In [2]:
# reading metadata (n=208)
mapping = pd.read_table("./data/haddad_6week_metadata.txt", sep = '\t', dtype = str, index_col = "#SampleID")
mapping.shape

(208, 69)

We load the microbiome feature table:
Microbiome (16S amplicon sequencing) feature table is represented as relative abundance of unique 16S sequences per sample. (generated using default parameters of deblur workflow in QIITA)

In [3]:
microb = pd.read_table("./data/haddad_6week_deblur_otus_unrare.txt", sep = '\t', dtype = str)

In [4]:
#converting microbiome feature table to biom HDF5 format
! biom convert -i ./data/haddad_6week_deblur_otus_unrare.txt \
  -o ./data/haddad_6week_deblur_otus_unrare_hdf5.biom --table-type "OTU table" \
  --to-hdf5

In [25]:
# converting microbiome biom table to QIIME2 artifact (https://docs.qiime2.org/2017.12/interfaces/artifact-api/)
! qiime tools import \
  --input-path ./data/haddad_6week_deblur_otus_unrare_hdf5.biom \
  --type 'FeatureTable[Frequency]' \
  --source-format BIOMV210Format \
  --output-path ./data/haddad_6week_deblur_otus_unrare_hdf5.qza

Next, we rarefy the deblur sotu table to an even sampling depth so that technical variation between sequencing efforts does not mask biological signal. 
- Sampling depth was chosen to be 2000 after interactive inspection of deblur sotu table in QIIME2 viewer. 182 samples (#sequences > 2000) were retained
- We save a biom 2.0 version of rarefied sotu table for downstream analysis

In [26]:
! qiime feature-table rarefy --i-table ./data/haddad_6week_deblur_otus_unrare_hdf5.qza \
--p-sampling-depth 2000 \
--o-rarefied-table ./data/haddad_6week_deblur_otus_rare2k.qza

! qiime tools export \
  ./data/haddad_6week_deblur_otus_rare2k.qza \
  --output-dir ./data/biom_rare2k_exp

[32mSaved FeatureTable[Frequency] to: ./data/haddad_6week_deblur_otus_rare2k.qza[0m


In [27]:
# convert biom table to to tab-delimited
! biom convert -i ./data/biom_rare2k_exp/feature-table.biom \
  -o ./data/haddad_6week_deblur_otus_rare2k.txt --table-type "OTU table" \
  --to-tsv

# update 'microb' dataframe to rarefied feature table
microb = pd.read_table("./data/haddad_6week_deblur_otus_rare2k.txt", sep = '\t', dtype = str, skiprows = 1,
                      index_col = "#OTU ID")

In [None]:
microb.head(3)

We load the metabolome feature table: Metabolome (untargeted LC-MS/MS) feature table is represented as relative abundance of small molecular features([massToCharge]_[retentionTime]) per sample. (generated using MZmine2)

In [29]:
# read metabolomic feature table
metab = pd.read_table("./data/haddad_6week_allFeatures_nonorm.txt", sep = '\t', dtype = str, index_col = "#featureID")
metab.head(3)

Unnamed: 0_level_0,10422.17.F.6,10422.17.F.2,10422.17.F.5,10422.17.F.7,10422.17.F.10,10422.17.F.4,10422.17.F.3,10422.17.F.8,10422.18.F.6,10422.18.F.2,...,10422.32.F.3,10422.32.F.7,10422.31.F.10,10422.32.F.12,10422.32.F.4,10422.32.F.13,10422.31.F.13,10422.32.F.9,10422.31.F.12,10422.32.F.11
#featureID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
132.1020923761221_0.4491537660256411,2894960860.988594,9610168467.776846,11822630149.85316,8237728873.128403,8141911119.798736,4635300446.851906,7689054931.586625,7823334115.189182,1670301666.0984683,12743426190.491224,...,12229052331.56484,8572231265.031997,9341332384.041187,4216869721.4174366,5433595592.99014,13011845333.573378,12588946976.28194,14050915077.632828,8432949753.442624,14761605463.249516
166.0862755041856_0.5858793269230769,1130593477.2556405,5955111906.130312,6746907823.730342,4111389704.534751,4110870394.659532,2524417564.67322,4109542013.7630014,5900437305.735503,487884818.7955626,8976942725.740124,...,8423897075.956502,5415180143.984718,5469345002.638374,2188625725.4779987,3125408082.6838765,9332327131.924,8416375498.222314,10852542398.524624,5641343370.020001,10800003954.566751
357.2784138555112_5.010000161030595,1892633838.037004,2092870323.073,2523493777.847994,1968808086.0177505,2088557820.1722808,3382040538.312136,2686549002.9608774,2523475211.4992456,1069906194.0879337,1878606502.437749,...,1516592900.9018714,2099398512.921627,2753771168.59438,2282654193.257812,2363335612.8312488,2650666432.169661,2243799569.820179,2755781674.588057,3552835060.5919967,2777342410.653752


Metabolomics feature table is PQN normalized to internal standard using the code below
(internal standard used: m/z = 278.1896579171936; retention time = 3.8181483173076907 mins)

In [30]:
# PQN normalization
int_stand='278.1896579171936_3.8181483173076907'
ref=pd.to_numeric(metab.loc[int_stand])
se_zero=[]

for feature in metab.index:
    metab.loc[feature]=pd.to_numeric(metab.loc[feature])
    size_effect=np.median(metab.loc[feature]/ref)
    if size_effect==0:
        se_zero.append(feature)
        continue
    else:
        metab.loc[feature]=metab.loc[feature]/size_effect

In [31]:
# save normalized table
metab.to_csv("./data/haddad_6week_allFeatures_pqn.txt", sep='\t')

It is expected that in high-throughput multi-omics experiments, some samples may not generate reliable data needed for further analysis. These samples are dropped during intial quality check. Therefore, we match samples present in metadata, microbiome and metabolome feature tables so as to keep only common samples for further analysis.

In [32]:
def match_samples(mic, met, md):
    mic_samp=set(mic.index)
    met_samp=set(met.index)
    md_samp=set(md.index)
    matched=sorted(list(mic_samp & met_samp & md_samp))
    mic=mic.loc[matched]
    met=met.loc[matched]
    md=md.loc[matched]
    return(mic, met, md)
    

    m1 = pd.DataFrame([[0, 0, 1],
                  [2, 2, 4],
                  [5, 5, 3]],
                  index=['s1', 's2', 's3'],
                  columns=['o1', 'o2', 'o3'])
    _mapping = pd.DataFrame([['a', 'control'],
                        ['b', 'control'],
                        ['c', 'diseased'],
                        ['d', 'diseased']],
                        index=['s1', 's2', 's3', 's4'],
                        columns=['Barcode', 'Treatment'])
    m2=pd.DataFrame([[2, 4, 4],
                 [5, 3, 3],
                 [0, 0, 1]],
                index=['s2', 's3', 's4'],
                columns=['c1', 'c2', 'c3'])
    assert(match_samples(m1, m2, _mapping)[0].index==['s2','s3'])

In [33]:
mic_match, met_match, mapping_match=match_samples(microb.T, metab.T, mapping)

In [34]:
print(mic_match.shape)
print(met_match.shape)
print(mapping_match.shape)

(182, 735)
(182, 1710)
(182, 69)


In [39]:
met_match = met_match.reset_index().rename(columns = {'index': '#SampleID'})
mic_match = mic_match.reset_index().rename(columns = {'index': '#SampleID'})
mapping_match = mapping_match.reset_index()

In [44]:
# save matched tables for downstream analysis
mapping_match.to_csv("./data/haddad_6weeks_metadata_matched.txt", sep ='\t', index = False)
mic_match.to_csv("./data/haddad_6weeks_deblur_otus_rare2k_matched.txt", sep ='\t', index = False)
met_match.to_csv("./data/haddad_6weeks_allFeatures_pqn_matched.txt", sep ='\t', index = False)

In [5]:
# convert matched feature tables to biom
! biom convert -i ./data/haddad_6weeks_deblur_otus_rare2k_matched.txt \
  -o ./data/haddad_6weeks_deblur_otus_rare2k_matched.biom --table-type "OTU table" \
  --to-hdf5

! biom convert -i ./data/haddad_6weeks_allFeatures_pqn_matched.txt \
  -o ./data/haddad_6weeks_allFeatures_pqn_matched.biom --table-type "OTU table" \
  --to-hdf5