/
biom_format.html
141 lines (123 loc) · 11.8 KB
/
biom_format.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>The biom file format — biom-format.org</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/haiku.css" />
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/doctools.js"></script>
<script src="../_static/sphinx_highlight.js"></script>
<script src="../_static/copybutton.js"></script>
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="The biom file format: Version 1.0" href="format_versions/biom-1.0.html" />
<link rel="prev" title="BIOM Documentation" href="index.html" />
</head><body>
<a href="https://github.com/biocore/biom-format"><img
style="position: absolute; top: 0; right: 0; border: 0;"
src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png"
alt="Fork me on GitHub"></a>
<div class="header" role="banner"><h1 class="heading"><a href="../index.html">
<span>biom-format.org</span></a></h1>
<h2 class="heading"><span>The biom file format</span></h2>
</div>
<div class="topnav" role="navigation" aria-label="top navigation">
<p>
«  <a href="index.html">BIOM Documentation</a>
  ::  
<a class="uplink" href="../index.html">Contents</a>
  ::  
<a href="format_versions/biom-1.0.html">The biom file format: Version 1.0</a>  »
</p>
</div>
<div class="content" role="main">
<section id="the-biom-file-format">
<span id="biom-format"></span><h1>The biom file format<a class="headerlink" href="#the-biom-file-format" title="Permalink to this heading">¶</a></h1>
<p>The BIOM project consists of two independent tools: the <cite>biom-format</cite> software package, which contains software tools for working with BIOM-formatted files and the tables they represent; and the BIOM file format. As of the 1.0.0 software version and the 1.0 file format version, the version of the software and the file format are independent of one another. Version specific documentation of the file formats can be found on the following pages.</p>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="format_versions/biom-1.0.html">The biom file format: Version 1.0</a><ul>
<li class="toctree-l2"><a class="reference internal" href="format_versions/biom-1.0.html#example-biom-files">Example biom files</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="format_versions/biom-2.0.html">The biom file format: Version 2.0</a><ul>
<li class="toctree-l2"><a class="reference internal" href="format_versions/biom-2.0.html#example-biom-files">Example biom files</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="format_versions/biom-2.1.html">The biom file format: Version 2.1</a><ul>
<li class="toctree-l2"><a class="reference internal" href="format_versions/biom-2.1.html#example-biom-files">Example biom files</a></li>
</ul>
</li>
</ul>
</div>
<p>Release versions contain three integers in the following format: <code class="docutils literal notranslate"><span class="pre">major-version.minor-version.micro-version</span></code>. When <code class="docutils literal notranslate"><span class="pre">-dev</span></code> is appended to the end of a version string that indicates a development (or between-release version). For example, <code class="docutils literal notranslate"><span class="pre">1.0.0-dev</span></code> would refer to the development version following the 1.0.0 release.</p>
</section>
<section id="tips-and-faqs-regarding-the-biom-file-format">
<span id="sparse-or-dense"></span><h1>Tips and FAQs regarding the BIOM file format<a class="headerlink" href="#tips-and-faqs-regarding-the-biom-file-format" title="Permalink to this heading">¶</a></h1>
<section id="motivation-for-the-biom-format">
<h2>Motivation for the BIOM format<a class="headerlink" href="#motivation-for-the-biom-format" title="Permalink to this heading">¶</a></h2>
<p>The BIOM format was motivated by several goals. First, to facilitate efficient handling and storage of large, sparse biological contingency tables; second, to support encapsulation of core study data (contingency table data and sample/observation metadata) in a single file; and third, to facilitate the use of these tables between tools that support this format (e.g., passing of data between <a class="reference external" href="http://www.qiime.org">QIIME</a>, <a class="reference external" href="http://metagenomics.anl.gov">MG-RAST</a>, and <a class="reference external" href="http://vamps.mbl.edu/">VAMPS</a>.).</p>
<section id="efficient-handling-and-storage-of-very-large-tables">
<h3>Efficient handling and storage of very large tables<a class="headerlink" href="#efficient-handling-and-storage-of-very-large-tables" title="Permalink to this heading">¶</a></h3>
<p>In <a class="reference external" href="http://www.qiime.org">QIIME</a>, we began hitting limitations with OTU table objects when working with thousands of samples and hundreds of thousands of OTUs. In the near future we expect that we’ll be dealing with hundreds of thousands of samples in single analyses.</p>
<p>The OTU table format up to QIIME 1.4.0 involved a dense matrix: if an OTU was not observed in a given sample, that would be indicated with a zero. We now primarily represent OTU tables in a sparse format: if an OTU is not observed in a sample, there is no count for that OTU. The two ways of representing this data are exemplified here.</p>
<p>A dense representation of an OTU table:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">OTU</span> <span class="n">ID</span> <span class="n">PC</span><span class="mf">.354</span> <span class="n">PC</span><span class="mf">.355</span> <span class="n">PC</span><span class="mf">.356</span>
<span class="n">OTU0</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">4</span>
<span class="n">OTU1</span> <span class="mi">6</span> <span class="mi">0</span> <span class="mi">0</span>
<span class="n">OTU2</span> <span class="mi">1</span> <span class="mi">0</span> <span class="mi">7</span>
<span class="n">OTU3</span> <span class="mi">0</span> <span class="mi">0</span> <span class="mi">3</span>
</pre></div>
</div>
<p>A sparse representation of an OTU table:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">PC</span><span class="mf">.354</span> <span class="n">OTU1</span> <span class="mi">6</span>
<span class="n">PC</span><span class="mf">.354</span> <span class="n">OTU2</span> <span class="mi">1</span>
<span class="n">PC</span><span class="mf">.356</span> <span class="n">OTU0</span> <span class="mi">4</span>
<span class="n">PC</span><span class="mf">.356</span> <span class="n">OTU2</span> <span class="mi">7</span>
<span class="n">PC</span><span class="mf">.356</span> <span class="n">OTU3</span> <span class="mi">3</span>
</pre></div>
</div>
<p>OTU table data tends to be sparse (e.g., greater than 90% of counts are zero, and frequently as many as 99% of counts are zero) in which case the latter format is more convenient to work with as it has a smaller memory footprint. In biom-format 1.0.0, both of these representations are supported in the biom-format project via dense and sparse Table types. In biom-format 2.x, only sparse is supported as in practice, dense was not useful especially with improved study designs that utilize increasing numbers of samples.</p>
</section>
<section id="encapsulation-of-core-study-data-otu-table-data-and-sample-otu-metadata-in-a-single-file">
<h3>Encapsulation of core study data (OTU table data and sample/OTU metadata) in a single file<a class="headerlink" href="#encapsulation-of-core-study-data-otu-table-data-and-sample-otu-metadata-in-a-single-file" title="Permalink to this heading">¶</a></h3>
<p>Formats, such as JSON and HDF5, made more efficient storage of highly sparse data and allowed for storage of arbitrary amounts of sample and OTU metadata in a single file. Sample metadata corresponds to what is generally found in QIIME mapping files. At this stage inclusion of this information in the OTU table file is optional, but it may be useful for sharing these files with other QIIME users and for publishing or archiving results of analyses. OTU metadata (generally a taxonomic assignment for an OTU) is also optional. In contrast to the previous OTU table format, you can now store more than one OTU metadata value in this field, so for example you can score taxonomic assignments based on two different taxonomic assignment approaches.</p>
</section>
<section id="facilitating-the-use-of-tables-between-tools-that-support-this-format">
<h3>Facilitating the use of tables between tools that support this format<a class="headerlink" href="#facilitating-the-use-of-tables-between-tools-that-support-this-format" title="Permalink to this heading">¶</a></h3>
<p>Different tools, such as <a class="reference external" href="http://www.qiime.org">QIIME</a>, <a class="reference external" href="http://metagenomics.anl.gov">MG-RAST</a>, and <a class="reference external" href="http://vamps.mbl.edu/">VAMPS</a> work with similar data structures that represent different types of data. An example of this is a <cite>metagenome</cite> table that could be generated by MG-RAST (where for example, columns are metagenomes and rows are functional categories). Exporting this data from MG-RAST in a suitable format will allow for the application of many of the QIIME tools to this data (such as generation of alpha rarefaction plots or beta diversity ordination plots). This new format is far more general than previous formats, so will support adoption by groups working with different data types and is already being integrated to support transfer of data between <a class="reference external" href="http://www.qiime.org">QIIME</a>, <a class="reference external" href="http://metagenomics.anl.gov">MG-RAST</a>, and <a class="reference external" href="http://vamps.mbl.edu/">VAMPS</a>.</p>
</section>
</section>
<section id="file-extension">
<h2>File extension<a class="headerlink" href="#file-extension" title="Permalink to this heading">¶</a></h2>
<p>We recommend that BIOM files use the <code class="docutils literal notranslate"><span class="pre">.biom</span></code> extension.</p>
</section>
</section>
</div>
<div class="bottomnav" role="navigation" aria-label="bottom navigation">
<p>
«  <a href="index.html">BIOM Documentation</a>
  ::  
<a class="uplink" href="../index.html">Contents</a>
  ::  
<a href="format_versions/biom-1.0.html">The biom file format: Version 1.0</a>  »
</p>
</div>
<div class="footer" role="contentinfo">
© Copyright 2011-2022 The BIOM Format Development Team.
Last updated on May 10, 2023.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.0.0.
</div>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-6636235-6']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</body>
</html>