Data, code and analysis results for the Earth Microbiome Project.
Jupyter Notebook OpenEdge ABL Python Other
Latest commit d6d4bda Feb 11, 2017 @cuttlefishh cuttlefishh committed on GitHub Merge pull request #72 from smirarab/master
add sepp script

Earth Microbiome Project

The Earth Microbiome Project (EMP) is a systematic attempt to characterize the global microbial taxonomic and functional diversity for the benefit of the planet and mankind.

The EMP is open science: anyone can get involved. The EMP data set is generated from samples that individual researchers have compiled and donated to the EMP. These data sets represent individual EMP studies. In addition to the individual studies, we are performing a cross-study meta-analysis. All per-study raw data is publicly available in the EMP portal to the QIIME database. This repository contains the processed, combined (i.e., across study) EMP data for the EMP meta-analysis as well as code developed specifically for the EMP meta-analyses, and new results as they are generated.

If you're interested in getting involved in EMP data analyses you should begin by reviewing the open issues. These describe analyses that we're interested in performing across studies. If you're interested in working on one of these analyses, or have ideas for other analyses that should be performed, you should get in touch with Greg Caporaso (, the Chief Data Analyst for the EMP.

Additional information is available on the Earth Microbiome Project website.

Organization of this repository

  • data/ data files used for downstream analysis (biom tables, trees, mapping files, etc)

    • data-urls.txt URLs where large data files can be found (e.g., BIOM and tree files). These are not stored in the repository, due to space limitations. You can download all of these files by running wget -i data-urls.txt from this directory.
      • emp-or.tre.gz newick-formatted tree corresponding to open reference (or) biom table
      • emp-or-mc2.biom.gz open-reference (or) biom table
      • emp-or-mc2-w-tax.biom.gz open-reference (or) biom table with taxonomy assignments
      • emp-or-mc2-w-tax-no-pynast-failures.biom.gz open-reference (or) biom table with taxonomy assignments, not including the OTUs failed to align using PyNAST
      • emp-cr.biom.gz closed-reference (cr) biom table
      • refseqs.fna.gz the new reference sequence collection resulting from open reference (or) OTU picking
      • sample-map.txt.gz sample metadata (i.e., mapping) file for all samples in biom table
      • observation-map.txt.gz observation (OTU) metadata (e.g., taxonomy assignments) for open reference (or) biom table
  • code code developed for EMP analysis

  • results high-level results (e.g., figures, etc that are useful for presentations)

  • presentations collection of presentations on EMP

File name abbreviation conventions

Finding older data

If you're looking for data generated and used for the ISME 14 EMP presentations, see here.