Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
AG clean Sep 30, 2015
AG_debug clean Sep 30, 2015
GG The various issues encountered on production data Nov 6, 2015
GG_debug syncing changes Jul 3, 2014
HMP Re-adding taxonomy May 18, 2015
HMP_debug syncing changes Jul 3, 2014
PGP SortMeRNA updates and latest tables Nov 14, 2014
PGP_debug syncing changes Jul 3, 2014
README.md Addresses notes on readme Feb 11, 2016



OTU Tables and QIIME-compliant mapping files used to generate figures and statistics for the American Gut project. All data are de-identified. These tables were picked against Greengenes 13_8 at 97% using SortMeRNA.

American Gut Data

American Gut tables hosted in the repository have not been updated since May 2015. It reflects an old version of the American Gut survey. The latest American Gut biom tables and mapping files can be found at ftp://ftp.microbio.me/AmericanGut/latest.

Data sources

The following studies are being used to provide context for the American Gut data:

Data prefixes

Each study used is described by an acronym:

  • AG, American Gut
  • HMP, Human Microbiome Project
  • GG, Global Gut
  • PGP, Personal Genome Project

File tags

The provided BIOM tables have a few different tags in the filenames to describe the included data.

  • 100nt - The sequences were trimmed to 100 nucleotides prior to OTU picking
  • even1k - The full table was rarified to 1000 sequences per sample
  • even10k - The full table was rarified to 10000 sequences per sample

The trimming is necessary when combining data from studies in which different sequences technologies were used (e.g., HiSeq vs. MiSeq).

Debug data

The debug data files are sourced from the main data files, but are 10% random subsets (by sample) of what is in them main files. The purpose of the debug files is to reduce processing load on the results framework for testing purposes.