Fetching contributors…
Cannot retrieve contributors at this time
206 lines (181 sloc) 17.9 KB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="">
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>STX B+ Tree Template Classes: STX B+ Tree Template Classes README</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
<div id="top"><!-- do not remove this div! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">STX B+ Tree Template Classes
&#160;<span id="projectnumber">0.9</span>
<!-- Generated by Doxygen -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
<div id="navrow1" class="tabs">
<ul class="tablist">
<li class="current"><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
<div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
<input type="text" id="MSearchField" value="Search" accesskey="S"
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
<div class="header">
<div class="headertitle">
<div class="title">STX B+ Tree Template Classes README </div> </div>
<div class="contents">
<div class="textblock"><dl class="author"><dt><b>Author:</b></dt><dd>Timo Bingmann (Mail: tb a-with-circle panthema dot net) </dd></dl>
<dl class="date"><dt><b>Date:</b></dt><dd>2013-05-05, 2011-05-17 and 2008-09-07</dd></dl>
<h2><a class="anchor" id="sec1"></a>
<p>The STX B+ Tree package is a set of C++ template classes implementing a B+ tree key/data container in main memory. The classes are designed as drop-in replacements of the STL containers <code>set</code>, <code>map</code>, <code>multiset</code> and <code>multimap</code> and follow their interfaces very closely. By packing multiple value pairs into each node of the tree the B+ tree reduces heap fragmentation and utilizes cache-line effects better than the standard red-black binary tree. The tree algorithms are based on the implementation in Cormen, Leiserson and Rivest's Introduction into Algorithms, Jan Jannink's paper and other algorithm resources. The classes contain extensive assertion and verification mechanisms to ensure the implementation's correctness by testing the tree invariants. To illustrate the B+ tree's structure a wxWidgets demo program is included in the source package.</p>
<h2><a class="anchor" id="sec2"></a>
Website / API Docs / Bugs / License</h2>
<p>The current source package can be downloaded from <a href=""></a></p>
<p>The include files are extensively documented using doxygen. The compiled doxygen html documentation is included in the source package. It can also be viewed online at <a href=""></a> (if you are not reading it right now).</p>
<p>The wxWidgets B+ tree demo program is located in the directory wxbtreedemo. Compiled binary versions can be found on the package web page mentioned above.</p>
<p>If bugs should become known they will be posted on the above web page together with patches or corrected versions.</p>
<p>The B+ tree template source code is released under the Boost Software License, Version 1.0, which can be found at the header of each include file.</p>
<p>All auxiliary programs like the wxWidgets demo, test suite and speed tests are licensed under the GNU General Public License v3 (GPLv3), which can be found in the file COPYING.GPLv3.</p>
<h2><a class="anchor" id="sec3"></a>
Original Idea</h2>
<p>The idea originally arose while coding a read-only database, which used a huge map of millions of non-sequential integer keys to 8-byte file offsets. When using the standard STL red-black tree implementation this would yield millions of 20-byte heap allocations and very slow search times due to the tree's height. So the original intention was to reduce memory fragmentation and improve search times. The B+ tree solves this by packing multiple data pairs into one node with a large number of descendant nodes.</p>
<p>In computer science lectures it is often stated that using consecutive bytes in memory would be more cache-efficient, because the CPU's cache levels always fetch larger blocks from main memory. So it would be best to store the keys of a node in one continuous array. This way the inner scanning loop would be accelerated by benefiting from cache effects and pipelining speed-ups. Thus the cost of scanning for a matching key would be lower than in a red-black tree, even though the number of key comparisons are theoretically larger. This second aspect aroused my academic interest and resulted in the <a class="el" href="index.html#sec10">speed test experiments</a>.</p>
<p>A third inspiration was that no working C++ template implementation of a B+ tree could be found on the Internet. Now this one can be found.</p>
<h2><a class="anchor" id="sec4"></a>
Implementation Overview</h2>
<p>This implementation contains five main classes within the <a class="el" href="a00036.html">stx</a> namespace (blandly named Some Template eXtensions). The base class <a class="el" href="a00001.html">btree</a> implements the B+ tree algorithms using inner and leaf nodes in main memory. Almost all STL-required function calls are implemented (see below for the exceptions). The asymptotic time requirements of the STL standard are theoretically not always fulfilled. However in practice this B+ tree performs better than the STL's red-black tree at the cost of using more memory. See the <a class="el" href="index.html#sec10">speed test results</a> for details.</p>
<p>The base class is then specialized into <a class="el" href="a00009.html">btree_set</a>, <a class="el" href="a00006.html">btree_multiset</a>, <a class="el" href="a00004.html">btree_map</a> and <a class="el" href="a00005.html">btree_multimap</a> using default template parameters and facade functions. These classes are designed to be drop-in replacements for the corresponding STL containers.</p>
<p>The insertion function splits the nodes on recursion unroll. Erase is largely based on Jannink's ideas. See <a href=""></a> for his paper on "Implementing Deletion in B+-trees".</p>
<p>The two set classes (<a class="el" href="a00009.html">btree_set</a> and <a class="el" href="a00006.html">btree_multiset</a>) are derived from the base implementation <a class="el" href="a00001.html">class btree</a> by specifying an empty struct as data_type. All functions are adapted to provide the base class with empty placeholder objects. Note that it is somewhat inefficient to implement a set or multiset using a B+ tree: a plain B tree (without +) would hold no extra copies of the keys. The main focus was on implementing the maps.</p>
<h2><a class="anchor" id="sec5"></a>
Problem with Separated Key/Data Arrays</h2>
<p>The most noteworthy difference to the default red-black tree implementation of std::map is that the B+ tree does not hold key/data pairs together in memory. Instead each B+ tree node has two separate arrays containing keys and data values. This design was chosen to utilize cache-line effects while scanning the key array.</p>
<p>However it also directly generates many problems in implementing the iterators' operators. These return a (writable) reference or pointer to a value_type, which is a std::pair composition. These data/key pairs however are not stored together and thus a temporary copy must be constructed. This copy should not be written to, because it is not stored back into the B+ tree. This effectively prohibits use of many STL algorithms which writing to the B+ tree's iterators. I would be grateful for hints on how to resolve this problem without folding the key and data arrays.</p>
<h2><a class="anchor" id="sec6"></a>
Test Suite</h2>
<p>The B+ tree distribution contains an extensive test suite. According to gcov 91.9% of the <a class="el" href="a00026.html" title="Contains the main B+ tree implementation template class btree.">btree.h</a> implementation is covered.</p>
<h2><a class="anchor" id="sec7"></a>
STL Incompatibilities</h2>
<h3><a class="anchor" id="sec7-1"></a>
Key and Data Type Requirements</h3>
<p>The tree algorithms currently do not use copy-construction. All key/data items are allocated in the nodes using the default-constructor and are subsequently only assigned new data (using <code>operator=</code>).</p>
<h3><a class="anchor" id="sec7-2"></a>
Iterators' Operators</h3>
<p>The most important incompatibility are the non-writable <code>operator*</code> and <code>operator-&gt;</code> of the <a class="el" href="a00016.html">iterator</a>. See above for a discussion of the problem on separated key/data arrays. Instead of <code>*iter</code> and <code>iter-&gt;</code> use the new function <code></code> which returns a writable reference to the data value in the tree.</p>
<h3><a class="anchor" id="sec7-3"></a>
Erase Functions</h3>
<p>The B+ tree supports three erase functions:</p>
<div class="fragment"><pre class="fragment">size_type erase(<span class="keyword">const</span> key_type &amp;key); <span class="comment">// erase all data pairs matching key</span>
<span class="keywordtype">bool</span> erase_one(<span class="keyword">const</span> key_type &amp;key); <span class="comment">// erase one data pair matching key</span>
<span class="keywordtype">void</span> erase(iterator iter); <span class="comment">// erase pair referenced by iter</span>
</pre></div><p>The following STL-required function is currently not supported:</p>
<div class="fragment"><pre class="fragment"><span class="keywordtype">void</span> erase(iterator first, iterator last);
</pre></div><h2><a class="anchor" id="sec8"></a>
<p>Beyond the usual STL interface the B+ tree classes support some extra goodies.</p>
<div class="fragment"><pre class="fragment"><span class="comment">// Bulk load a sorted range. Loads items into leaves and constructs a</span>
<span class="comment">// B-tree above them. The tree must be empty when calling this function.</span>
<span class="keyword">template</span> &lt;<span class="keyword">typename</span> Iterator&gt;
<span class="keywordtype">void</span> bulk_load(Iterator ibegin, Iterator iend);
<span class="comment">// Output the tree in a pseudo-hierarchical text dump to std::cout. This</span>
<span class="comment">// function requires that BTREE_DEBUG is defined prior to including the btree</span>
<span class="comment">// headers. Furthermore the key and data types must be std::ostream printable.</span>
<span class="keywordtype">void</span> print() <span class="keyword">const</span>;
<span class="comment">// Run extensive checks of the tree invariants. If a corruption in found the</span>
<span class="comment">// program will abort via assert(). See below on enabling auto-verification.</span>
<span class="keywordtype">void</span> verify() <span class="keyword">const</span>;
<span class="comment">// Serialize and restore the B+ tree nodes and data into/from a binary image.</span>
<span class="comment">// This requires that the key and data types are integral and contain no</span>
<span class="comment">// outside pointers or references.</span>
<span class="keywordtype">void</span> dump(std::ostream &amp;os) <span class="keyword">const</span>;
<span class="keywordtype">bool</span> restore(std::istream &amp;is);
</pre></div><h2><a class="anchor" id="sec9"></a>
B+ Tree Traits</h2>
<p>All tree template classes take a template parameter structure which holds important options of the implementation. The following structure shows which static variables specify the options and the corresponding defaults:</p>
<div class="fragment"><pre class="fragment"><span class="keyword">struct </span>btree_default_map_traits
<span class="comment">// If true, the tree will self verify it&#39;s invariants after each insert()</span>
<span class="comment">// or erase(). The header must have been compiled with BTREE_DEBUG</span>
<span class="comment">// defined.</span>
<span class="keyword">static</span> <span class="keyword">const</span> <span class="keywordtype">bool</span> selfverify = <span class="keyword">false</span>;
<span class="comment">// If true, the tree will print out debug information and a tree dump</span>
<span class="comment">// during insert() or erase() operation. The header must have been</span>
<span class="comment">// compiled with BTREE_DEBUG defined and key_type must be std::ostream</span>
<span class="comment">// printable.</span>
<span class="keyword">static</span> <span class="keyword">const</span> <span class="keywordtype">bool</span> debug = <span class="keyword">false</span>;
<span class="comment">// Number of slots in each leaf of the tree. Estimated so that each node</span>
<span class="comment">// has a size of about 256 bytes.</span>
<span class="keyword">static</span> <span class="keyword">const</span> <span class="keywordtype">int</span> leafslots =
MAX( 8, 256 / (<span class="keyword">sizeof</span>(_Key) + <span class="keyword">sizeof</span>(_Data)) );
<span class="comment">// Number of slots in each inner node of the tree. Estimated so that each</span>
<span class="comment">// node has a size of about 256 bytes.</span>
<span class="keyword">static</span> <span class="keyword">const</span> <span class="keywordtype">int</span> innerslots =
MAX( 8, 256 / (<span class="keyword">sizeof</span>(_Key) + <span class="keyword">sizeof</span>(<span class="keywordtype">void</span>*)) );
<span class="comment">// As of stx-btree-0.9, the code does linear search in find_lower() and</span>
<span class="comment">// find_upper() instead of binary_search, unless the node size is larger</span>
<span class="comment">// than this threshold. See notes at</span>
<span class="comment">//</span>
<span class="keyword">static</span> <span class="keyword">const</span> <span class="keywordtype">size_t</span> binsearch_threshold = 256;
</pre></div><h2><a class="anchor" id="sec10"></a>
Speed Tests</h2>
<p>See the web page <a href=""></a> for speed test results and a discussion thereof. </p>
</div></div><!-- contents -->
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Namespaces</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark">&#160;</span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark">&#160;</span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(9)"><span class="SelectionMark">&#160;</span>Friends</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(10)"><span class="SelectionMark">&#160;</span>Defines</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
<hr class="footer"/><address class="footer"><small>
Generated on Sun May 5 2013 23:38:44 for STX B+ Tree Template Classes by &#160;<a href="">
<img class="footer" src="doxygen.png" alt="doxygen"/>