This notebook is going to mostly scratch work looking at the "sample size" simulation.

Specifically, I'll start with a case-control dataset (to simulate a "real" microbiome effect) and:

- subsample cases and controls at different total N's
- also subsample with different proportions of cases/controls
- calculate significant associations
    - need to remove OTUs which are completely zero before doing this
    - and then correct for multiple tests
- and see what percent of the "real" associations it recovers ("real" associations calculated from the whole dataset)

In this exploration, I also need to make sure that the bugs which are significant in the smaller subset are the same ones as in the whole thing...

In [42]:
import pandas as pd
import numpy as np
import feather

from scipy.stats.mstats import kruskalwallis
from scipy.stats import ranksums, mannwhitneyu
# FDR correction
from statsmodels.sandbox.stats.multicomp import multipletests


In [50]:
# Taken from util.py in microbiomeHD
def univariate_one_col(xvals, yvals, pfun):
    try:
        h, p = pfun(xvals, yvals)
    except ValueError:
        p = 1
        h = 0
    return pd.Series([p, h])

def compare_otus_teststat(df, Xsmpls, Ysmpls, method='kruskal-wallis', multi_comp=None):
    """
    Compares columns between Xsmpls and Ysmpls, with statistical method=method.
    Returns dataframe with both the qvals ('p') and test statistic ('test-stat')

    parameters
    ----------
    df             dataframe, samples are in rows and OTUs in columns
    X,Ysmpls       list of samples to compare
    method         statistical method to use for comparison
    multi_comp     str, type of multiple comparison test to do.
                   Currently accepts 'fdr' or None

    outputs
    -------
    results        dataframe with OTUs in rows and 'p' and 'test-stat' in columns

    """
    if method == 'kruskal-wallis':
        pfun = kruskalwallis
    elif method == 'wilcoxon' or method == 'ranksums':
        pfun = ranksums
    elif method == 'mann-whitney':
        pfun = mannwhitneyu
        # Note: prob wanna add some kwargs here to say whether 2sided or not

    results = df.apply(
        lambda col: univariate_one_col(col.loc[Xsmpls].values, col.loc[Ysmpls].values, pfun)).T
    results.columns = ['p', 'test_stat']

    if multi_comp == 'fdr':
        _, results['q'], _, _ = multipletests(results['p'], method='fdr_bh')

    return results

def read_dataframe(fn):
    df = feather.read_dataframe(fn)
    df.index = df.iloc[:,0]
    df = df.iloc[:, 1:]
    
    return df

# The setup

The eventual plots will be faceted by FMT response rate, x-axis = total N, y-axis = % of max rejections, and each line will be a study.

So I'll need to make a dataframe with the following columns:
- % cases (i.e. FMT response rate)
- total N (x axis)
- number reject (y axis)
- total reject (for that study)
- study (hue)

## CRC Baxter

Let's start with one dataset, CRC Baxter.

In [51]:
fotu = '../../data/clean/crc_baxter.otu_table.feather'
fgenus = '../../data/clean/crc_baxter.otu_table.genus.feather'
fmeta = '../../data/clean/crc_baxter.metadata.feather'

df, genusdf, meta = (read_dataframe(f) for f in [fotu, fgenus, fmeta])
df.shape, genusdf.shape, meta.shape

((490, 18448), (490, 163), (490, 72))

In [52]:
df.head()

Unnamed: 0_level_0,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Clostridium_XI;s__;d__denovo15989,k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__;d__denovo15988,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;d__denovo723,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Faecalibacterium;s__;d__denovo722,k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Prevotellaceae;g__Prevotella;s__;d__denovo12511,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__;d__denovo12510,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__;s__;d__denovo12516,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;d__denovo20538,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__;g__;s__;d__denovo20532,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Ruminococcus;s__;d__denovo20533,...,k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Verrucomicrobiaceae;g__Akkermansia;s__;d__denovo14612,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Lachnospiracea_incertae_sedis;s__;d__denovo20807,k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Coriobacteriales;f__Coriobacteriaceae;g__Enterorhabdus;s__;d__denovo19190,k__Bacteria;p__Firmicutes;c__Negativicutes;o__Selenomonadales;f__Acidaminococcaceae;g__Phascolarctobacterium;s__;d__denovo19192,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__;s__;d__denovo19193,k__Bacteria;p__Verrucomicrobia;c__Verrucomicrobiae;o__Verrucomicrobiales;f__Verrucomicrobiaceae;g__Akkermansia;s__;d__denovo19195,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Dorea;s__;d__denovo19196,k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;d__denovo19197,k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Bifidobacteriales;f__Bifidobacteriaceae;g__Bifidobacterium;s__;d__denovo19198,k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__;d__denovo20801
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2045653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2087650,0.0,0.0,0.0,0.000717,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2963670,0.0,0.0,0.0,6e-05,0.0,0.0,0.0,6e-05,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2527670,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3453650,0.0,0.0,0.0,5.2e-05,0.0,0.0,0.0,0.0,0.0,0.0,...,7.8e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [53]:
meta.groupby(['DiseaseState']).size()

DiseaseState
CRC       120
H         172
nonCRC    198
dtype: int64

Let's do CRC vs. H

In [47]:
h = meta.query('DiseaseState == "H"').index.tolist()
crc = meta.query('DiseaseState == "CRC"').index.tolist()

p = compare_otus_teststat(df, h, crc, method='kruskal-wallis', multi_comp='fdr')

In [48]:
p.sort_values(by='q').head(10)

Unnamed: 0,p,test_stat,q
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Parvimonas;s__;d__denovo596,4.200885e-10,39.017198,8e-06
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas;s__;d__denovo404,2.87336e-08,30.791213,0.000265
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas;s__;d__denovo216,7.757245e-08,28.865752,0.000477
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas;s__;d__denovo196,1.368985e-07,27.766179,0.000622
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Peptostreptococcus;s__;d__denovo670,1.68548e-07,27.363906,0.000622
k__Bacteria;p__Firmicutes;c__Negativicutes;o__Selenomonadales;f__Veillonellaceae;g__Dialister;s__;d__denovo1597,8.12235e-07,24.328614,0.002497
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Eubacteriaceae;g__Eubacterium;s__;d__denovo664,3.289453e-06,21.639854,0.008669
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Actinomyces;s__;d__denovo2218,2.371054e-05,17.865299,0.054677
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Peptostreptococcus;s__;d__denovo1172,2.769373e-05,17.5699,0.056766
k__Bacteria;p__Fusobacteria;c__Fusobacteriia;o__Fusobacteriales;f__Fusobacteriaceae;g__Fusobacterium;s__;d__denovo1822,0.0001204994,14.784855,0.178292


In [49]:
alpha = 0.05
sum(p['q'] <= alpha)

7

### Check genus-level full results

In [54]:
pgenus = compare_otus_teststat(genusdf, h, crc, method='kruskal-wallis', multi_comp='fdr')

In [58]:
pgenus.sort_values(by='q').head(10)

Unnamed: 0,p,test_stat,q
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas,6.042793e-10,38.307565,4.924876e-08
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Parvimonas,3.289881e-10,39.49448,4.924876e-08
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Peptostreptococcus,1.174227e-09,37.011697,6.379968e-08
k__Bacteria;p__Fusobacteria;c__Fusobacteriia;o__Fusobacteriales;f__Fusobacteriaceae;g__Fusobacterium,3.155495e-07,26.152126,1.285864e-05
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Anaerococcus,1.197008e-05,19.16804,0.0003902247
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella,0.000943122,10.936027,0.02069983
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Catenibacterium,0.001015943,10.798283,0.02069983
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Peptoniphilus,0.0008253342,11.183377,0.02069983
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Clostridium_XVIII,0.001305592,10.334604,0.02364571
k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus,0.001798781,9.744316,0.02932013


In [59]:
sum(pgenus['q'] <= 0.05)

16

## Subsample evenly

Let's just try it once to check that the same bugs come out as significant. We'll subsample cases and controls to a significant portion of the original size just to make sure we still get hits.

In [60]:
# Set random seed
random_state = 12345
np.random.seed(random_state)

In [61]:
subh = np.random.choice(h, size=int(np.ceil(0.6*len(h))))
subcrc = np.random.choice(crc, size=int(np.ceil(0.6*len(crc))))

In [62]:
subp = compare_otus_teststat(df, subh, subcrc, method='kruskal-wallis', multi_comp='fdr')

In [63]:
subp.sort_values(by='q').head(10)

Unnamed: 0,p,test_stat,q
k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__;d__denovo3235,3e-06,21.897024,0.026536
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;d__denovo362,2e-06,22.924502,0.026536
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Parvimonas;s__;d__denovo596,5e-06,21.032451,0.027769
k__Bacteria;p__Proteobacteria;c__Deltaproteobacteria;o__Desulfovibrionales;f__Desulfovibrionaceae;g__Desulfovibrio;s__;d__denovo387,9e-06,19.632208,0.043294
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Blautia;s__;d__denovo2307,1.7e-05,18.51677,0.062139
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhodospirillales;f__Rhodospirillaceae;g__;s__;d__denovo410,4e-05,16.871077,0.123007
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__;s__;d__denovo5383,9.5e-05,15.223713,0.155675
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptostreptococcaceae;g__Peptostreptococcus;s__;d__denovo670,0.000225,13.610715,0.155675
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__;g__;s__;d__denovo517,6.2e-05,16.054263,0.155675
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella;s__;d__denovo8715,0.000225,13.610817,0.155675


Let's also check genus-level results.

In [64]:
subpgenus = compare_otus_teststat(genusdf, subh, subcrc, method='kruskal-wallis', multi_comp='fdr')

In [65]:
subpgenus.sort_values(by='q').head(10)

Unnamed: 0,p,test_stat,q
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Parvimonas,1e-06,23.819784,0.000172
k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus,3e-06,21.897024,0.000234
k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Porphyromonas,0.000189,13.942037,0.007668
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella,0.000235,13.52654,0.007668
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiales_Incertae_Sedis_XI;g__Anaerococcus,0.000232,13.548569,0.007668
k__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobrevibacter,0.00029,13.13502,0.007874
k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Oxalobacteraceae;g__Oxalobacter,0.000816,11.203638,0.018413
k__Bacteria;p__Firmicutes;c__Erysipelotrichia;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Catenibacterium,0.000904,11.015114,0.018413
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium,0.00115,10.568925,0.020829
k__Bacteria;p__Fusobacteria;c__Fusobacteriia;o__Fusobacteriales;f__Fusobacteriaceae;g__Fusobacterium,0.001864,9.678772,0.030385


I think this looks fine. Let's do both OTU- and genus-level moving forward (bc why not! Genus-level takes less time and is easier to compare with my meta-analysis, but OTU-level is probably more related to what clinicians do in real life)

## Plan for entire simulation

I think we'll want two output files from the simulation:

- file with number of rejections, with the following columns:
    - total N
    - percent case
    - study
    - alpha level (FDR - so can go up to 0.25)
    - total original rejected
    - current total rejected
  
- and another with the OTU-level results, with these columns:
    - otu
    - study
    - total N
    - percent case
    - pval
    - qval

In [None]:
ctrl = h
case = crc

## First, get the qvalues using the entire dataset
potu = compare_otus_teststat(
    df, ctrl, case, 
    method='kruskal-wallis', multi_comp='fdr')
pgenus = compare_otus_teststat(
    genusdf, ctrl, case, 
    method='kruskal-wallis', multi_comp='fdr')

In [67]:
print(sum(potu['q'] <= alpha), sum(pgenus['q'] <= alpha))

(7, 16)


In [73]:
## Set up simulation parameters
# Total N and % cases/controls needs to be limited by available N
maxN = len(case + ctrl)

# Total N is defined as a function of "reasonable" clinical trials
totalNs = [10, 25, 50, 75, 100, 150, 200]
perc_success = [0.1, 0.25, 0.5, 0.75, 0.9]

## Run simulation
for n in totalNs:
    for p in perc_success:
        # Define number of cases and controls
        n_ctrl = int(np.ceil((1.0 - p)*n))
        n_case = int(np.ceil(p*n))
        
        if (n_ctrl > len(ctrl)) or (n_case > len(case)):
            # Return np.nan here!
            print("Check your parameter settings, asking for too many samples!")
            continue
        
        print(n, p, n_ctrl, n_case)
        
        # Subsample our cases and controls
        subctrl = np.random.choice(ctrl, size=n_ctrl)
        subcase = np.random.choice(case, size=n_case)
        
        # Calculate qvalues, OTU-level
        #psub = compare_otus_teststat(
        #    df, subctrl, subcase, 
        #    method='kruskal-wallis', multi_comp='fdr')
        
        # Calculate qvalues, genus-level
        psubgenus = compare_otus_teststat(
            genusdf, subctrl, subcase, 
            method='kruskal-wallis', multi_comp='fdr')

        #print(sum(psub['q'] <= alpha), sum(psubgenus['q'] <= alpha))
        print(sum(psubgenus['q'] <= alpha))

(10, 0.1, 9, 1)
0
(10, 0.25, 8, 3)
0
(10, 0.5, 5, 5)
0
(10, 0.75, 3, 8)
0
(10, 0.9, 1, 9)
0
(25, 0.1, 23, 3)
0
(25, 0.25, 19, 7)
0
(25, 0.5, 13, 13)
0
(25, 0.75, 7, 19)
0
(25, 0.9, 3, 23)
0
(50, 0.1, 45, 5)
1
(50, 0.25, 38, 13)
0
(50, 0.5, 25, 25)
0
(50, 0.75, 13, 38)
0
(50, 0.9, 5, 45)
3
(75, 0.1, 68, 8)
3
(75, 0.25, 57, 19)
0
(75, 0.5, 38, 38)
4
(75, 0.75, 19, 57)
0
(75, 0.9, 8, 68)
0
(100, 0.1, 90, 10)
2
(100, 0.25, 75, 25)
2
(100, 0.5, 50, 50)
0
(100, 0.75, 25, 75)
0
(100, 0.9, 10, 90)
0
(150, 0.1, 135, 15)
3
(150, 0.25, 113, 38)
6
(150, 0.5, 75, 75)
7
(150, 0.75, 38, 113)
0
Check your parameter settings, asking for too many samples!
Check your parameter settings, asking for too many samples!
(200, 0.25, 150, 50)
13
(200, 0.5, 100, 100)
15
Check your parameter settings, asking for too many samples!
Check your parameter settings, asking for too many samples!


Ok, this works pretty well and I definitely really want to see all the results together. It's very strange to me the the 10% and 90% are sometimes doing better than the 50% split! Need to do a few reps of this...

## Scratch work to write power_simulation.py

In [77]:
DATASETS = ['cdi_schubert', 'crc_baxter', 'ibd_papa', 'ob_goodrich']

for dataset in DATASETS:
    fotu = '../../data/clean/' + dataset + '.otu_table.feather'
    fgenus = '../../data/clean/' + dataset + '.otu_table.genus.feather'
    fmeta = '../../data/clean/' + dataset + '.metadata.feather'

    df, genusdf, meta = (read_dataframe(f) for f in [fotu, fgenus, fmeta])
    print(dataset)
    print(meta.groupby(['DiseaseState']).size())
    print("")

cdi_schubert
DiseaseState
CDI        93
H         154
nonCDI     89
dtype: int64

crc_baxter
DiseaseState
CRC       120
H         172
nonCRC    198
dtype: int64

ibd_papa
DiseaseState
CD          23
IBDundef     1
UC          43
nonIBD      24
dtype: int64

ob_goodrich
DiseaseState
H     428
OB    185
OW    319
dtype: int64



In [79]:
import copy

tmpdf = copy.deepcopy(pgenus)

In [80]:
tmpdf.columns

Index([u'p', u'test_stat', u'q'], dtype='object')

In [81]:
tmpdf = tmpdf.rename(columns={'p': 'p_allsamples',
                              'q': 'q_allsamples',
                              'test_stat': 'test_stat_allsamples'})

In [83]:
pd.concat((tmpdf, pgenus), axis=1)

Unnamed: 0,p_allsamples,test_stat_allsamples,q_allsamples,p,test_stat,q
k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Leuconostocaceae;g__Leuconostoc,0.885686,0.020668,0.907967,0.885686,0.020668,0.907967
k__Bacteria;p__Firmicutes;c__Negativicutes;o__Selenomonadales;f__Veillonellaceae;g__Selenomonas,0.394988,0.723531,0.674756,0.394988,0.723531,0.674756
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Methylobacteriaceae;g__Methylobacterium,0.152064,2.051424,0.467668,0.152064,2.051424,0.467668
k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__Ralstonia,0.195144,1.678349,0.532524,0.195144,1.678349,0.532524
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium,0.252247,1.310813,0.532524,0.252247,1.310813,0.532524
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Aeromonadales;f__Aeromonadaceae;g__Aeromonas,0.016093,5.792711,0.114049,0.016093,5.792711,0.114049
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Enterobacter,0.003626,8.462056,0.046417,0.003626,8.462056,0.046417
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Eubacteriaceae;g__Acetobacterium,0.927988,0.008168,0.939516,0.927988,0.008168,0.939516
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;g__Pantoea,0.715993,0.132365,0.830594,0.715993,0.132365,0.830594
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter,0.171321,1.871345,0.503981,0.171321,1.871345,0.503981
