Skip to content

Latest commit

 

History

History
43 lines (29 loc) · 1.96 KB

README.md

File metadata and controls

43 lines (29 loc) · 1.96 KB

Data

OTU Tables and QIIME-compliant mapping files used to generate figures and statistics for the American Gut project. All data are de-identified. These tables were picked against Greengenes 13_8 at 97% using SortMeRNA.

American Gut Data

American Gut tables hosted in the repository have not been updated since May 2015. It reflects an old version of the American Gut survey. The latest American Gut biom tables and mapping files can be found at ftp://ftp.microbio.me/AmericanGut/latest.

Data sources

The following studies are being used to provide context for the American Gut data:

Data prefixes

Each study used is described by an acronym:

  • AG, American Gut
  • HMP, Human Microbiome Project
  • GG, Global Gut
  • PGP, Personal Genome Project

File tags

The provided BIOM tables have a few different tags in the filenames to describe the included data.

  • 100nt - The sequences were trimmed to 100 nucleotides prior to OTU picking
  • even1k - The full table was rarified to 1000 sequences per sample
  • even10k - The full table was rarified to 10000 sequences per sample

The trimming is necessary when combining data from studies in which different sequences technologies were used (e.g., HiSeq vs. MiSeq).

Debug data

The debug data files are sourced from the main data files, but are 10% random subsets (by sample) of what is in them main files. The purpose of the debug files is to reduce processing load on the results framework for testing purposes.