Browse files


  • Loading branch information...
bingmann committed Apr 25, 2007
1 parent d8d7113 commit 66a03e2acb95b5968e98dc036d3a5cfd6246d728
Showing with 873 additions and 0 deletions.
  1. +1 −0 AUTHORS
  2. +504 −0 COPYING
  3. 0 ChangeLog
  4. +234 −0 INSTALL
  5. 0 NEWS
  6. +134 −0 README
@@ -0,0 +1 @@
+Timo Bingmann <>

Large diffs are not rendered by default.

Oops, something went wrong.
No changes.
@@ -0,0 +1,234 @@
+Installation Instructions
+Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005,
+2006 Free Software Foundation, Inc.
+This file is free documentation; the Free Software Foundation gives
+unlimited permission to copy, distribute and modify it.
+Basic Installation
+Briefly, the shell commands `./configure; make; make install' should
+configure, build, and install this package. The following
+more-detailed instructions are generic; see the `README' file for
+instructions specific to this package.
+ The `configure' shell script attempts to guess correct values for
+various system-dependent variables used during compilation. It uses
+those values to create a `Makefile' in each directory of the package.
+It may also create one or more `.h' files containing system-dependent
+definitions. Finally, it creates a shell script `config.status' that
+you can run in the future to recreate the current configuration, and a
+file `config.log' containing compiler output (useful mainly for
+debugging `configure').
+ It can also use an optional file (typically called `config.cache'
+and enabled with `--cache-file=config.cache' or simply `-C') that saves
+the results of its tests to speed up reconfiguring. Caching is
+disabled by default to prevent problems with accidental use of stale
+cache files.
+ If you need to do unusual things to compile the package, please try
+to figure out how `configure' could check whether to do them, and mail
+diffs or instructions to the address given in the `README' so they can
+be considered for the next release. If you are using the cache, and at
+some point `config.cache' contains results you don't want to keep, you
+may remove or edit it.
+ The file `' (or `') is used to create
+`configure' by a program called `autoconf'. You need `' if
+you want to change it or regenerate `configure' using a newer version
+of `autoconf'.
+The simplest way to compile this package is:
+ 1. `cd' to the directory containing the package's source code and type
+ `./configure' to configure the package for your system.
+ Running `configure' might take a while. While running, it prints
+ some messages telling which features it is checking for.
+ 2. Type `make' to compile the package.
+ 3. Optionally, type `make check' to run any self-tests that come with
+ the package.
+ 4. Type `make install' to install the programs and any data files and
+ documentation.
+ 5. You can remove the program binaries and object files from the
+ source code directory by typing `make clean'. To also remove the
+ files that `configure' created (so you can compile the package for
+ a different kind of computer), type `make distclean'. There is
+ also a `make maintainer-clean' target, but that is intended mainly
+ for the package's developers. If you use it, you may have to get
+ all sorts of other programs in order to regenerate files that came
+ with the distribution.
+Compilers and Options
+Some systems require unusual options for compilation or linking that the
+`configure' script does not know about. Run `./configure --help' for
+details on some of the pertinent environment variables.
+ You can give `configure' initial values for configuration parameters
+by setting variables in the command line or in the environment. Here
+is an example:
+ ./configure CC=c99 CFLAGS=-g LIBS=-lposix
+ *Note Defining Variables::, for more details.
+Compiling For Multiple Architectures
+You can compile the package for more than one kind of computer at the
+same time, by placing the object files for each architecture in their
+own directory. To do this, you can use GNU `make'. `cd' to the
+directory where you want the object files and executables to go and run
+the `configure' script. `configure' automatically checks for the
+source code in the directory that `configure' is in and in `..'.
+ With a non-GNU `make', it is safer to compile the package for one
+architecture at a time in the source code directory. After you have
+installed the package for one architecture, use `make distclean' before
+reconfiguring for another architecture.
+Installation Names
+By default, `make install' installs the package's commands under
+`/usr/local/bin', include files under `/usr/local/include', etc. You
+can specify an installation prefix other than `/usr/local' by giving
+`configure' the option `--prefix=PREFIX'.
+ You can specify separate installation prefixes for
+architecture-specific files and architecture-independent files. If you
+pass the option `--exec-prefix=PREFIX' to `configure', the package uses
+PREFIX as the prefix for installing programs and libraries.
+Documentation and other data files still use the regular prefix.
+ In addition, if you use an unusual directory layout you can give
+options like `--bindir=DIR' to specify different values for particular
+kinds of files. Run `configure --help' for a list of the directories
+you can set and what kinds of files go in them.
+ If the package supports it, you can cause programs to be installed
+with an extra prefix or suffix on their names by giving `configure' the
+option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
+Optional Features
+Some packages pay attention to `--enable-FEATURE' options to
+`configure', where FEATURE indicates an optional part of the package.
+They may also pay attention to `--with-PACKAGE' options, where PACKAGE
+is something like `gnu-as' or `x' (for the X Window System). The
+`README' should mention any `--enable-' and `--with-' options that the
+package recognizes.
+ For packages that use the X Window System, `configure' can usually
+find the X include and library files automatically, but if it doesn't,
+you can use the `configure' options `--x-includes=DIR' and
+`--x-libraries=DIR' to specify their locations.
+Specifying the System Type
+There may be some features `configure' cannot figure out automatically,
+but needs to determine by the type of machine the package will run on.
+Usually, assuming the package is built to be run on the _same_
+architectures, `configure' can figure that out, but if it prints a
+message saying it cannot guess the machine type, give it the
+`--build=TYPE' option. TYPE can either be a short name for the system
+type, such as `sun4', or a canonical name which has the form:
+where SYSTEM can have one of these forms:
+ See the file `config.sub' for the possible values of each field. If
+`config.sub' isn't included in this package, then this package doesn't
+need to know the machine type.
+ If you are _building_ compiler tools for cross-compiling, you should
+use the option `--target=TYPE' to select the type of system they will
+produce code for.
+ If you want to _use_ a cross compiler, that generates code for a
+platform different from the build platform, you should specify the
+"host" platform (i.e., that on which the generated programs will
+eventually be run) with `--host=TYPE'.
+Sharing Defaults
+If you want to set default values for `configure' scripts to share, you
+can create a site shell script called `' that gives default
+values for variables like `CC', `cache_file', and `prefix'.
+`configure' looks for `PREFIX/share/' if it exists, then
+`PREFIX/etc/' if it exists. Or, you can set the
+`CONFIG_SITE' environment variable to the location of the site script.
+A warning: not all `configure' scripts look for a site script.
+Defining Variables
+Variables not defined in a site shell script can be set in the
+environment passed to `configure'. However, some packages may run
+configure again during the build, and the customized values of these
+variables may be lost. In order to avoid this problem, you should set
+them in the `configure' command line, using `VAR=value'. For example:
+ ./configure CC=/usr/local2/bin/gcc
+causes the specified `gcc' to be used as the C compiler (unless it is
+overridden in the site shell script).
+Unfortunately, this technique does not work for `CONFIG_SHELL' due to
+an Autoconf bug. Until the bug is fixed you can use this workaround:
+ CONFIG_SHELL=/bin/bash /bin/bash ./configure CONFIG_SHELL=/bin/bash
+`configure' Invocation
+`configure' recognizes the following options to control how it operates.
+ Print a summary of the options to `configure', and exit.
+ Print the version of Autoconf used to generate the `configure'
+ script, and exit.
+ Enable the cache: use and save the results of the tests in FILE,
+ traditionally `config.cache'. FILE defaults to `/dev/null' to
+ disable caching.
+ Alias for `--cache-file=config.cache'.
+ Do not print messages saying which checks are being made. To
+ suppress all normal output, redirect it to `/dev/null' (any error
+ messages will still be shown).
+ Look for the package's source code in directory DIR. Usually
+ `configure' can determine that directory automatically.
+`configure' also accepts some other, not widely useful, options. Run
+`configure --help' for more details.
No changes.
@@ -0,0 +1,134 @@
+ *** STX B-Tree C++ Implementation ***
+The STX B-Tree is a set of C++ template classes implementing a B+ tree in main
+memory. The classes are designed as drop-in replacements of the STL containers
+set, map, multiset and multimap and follow their interfaces very closely. But
+instead of the standard red-black binary tree, the key/data pairs are stored in
+a B+ tree with variable node size. The tree algorithms are based on the
+implementation in Cormen's et al. Introduction into Algorithms, Jan Jannink's
+paper and other algorithm resources. The classes contain extensive assertion
+and verification mechanisms to ensure the implementation's correctness by
+testing the tree invariants.
+--- Original Idea ---
+The original idea was to group very small key/data pairs into larger memory
+pages. The initial application was a huge map of millions of non-sequential
+integer keys to 8-byte file offsets. When using the STL red-black tree
+implementation this would yield millions of 20-byte heap allocations and very
+slow search times due to the tree's height. The B+ tree packs multiple data
+pairs into one node thus reducing memory fragmentation, building a shallow tree
+and utilizing cache effects when scanning the key array.
+--- Implementation Overview ---
+This implementation contains five main classes within the "stx" (blandly named
+Some Template eXtensions). The base class "btree" implements the B+ tree
+algorithms using allocation nodes in main memory. Almost all STL-required
+function calls are implemented (see below for the exceptions). The asymptotic
+time requirements of the STL standard are theoretically not always fulfilled.
+However in practice this B+ tree performs better than a red-black tree while
+utilizing more memory. See the speed test results below for details.
+The base class is then specialized btree_set, btree_multiset, btree_map and
+btree_multimap using default template parameters and facade-functions. These
+classes are designed to be drop-in replacements for the corresponding STL
+The insertion function splits the nodes on recursion unroll. Erase is largely
+based on Jannink's ideas. See
+for his paper on "Implementing Deletion in B+-trees".
+The set class is derived from the base implementation class btree by specifying
+an empty struct as data_type. All function are adapted to provide the inner
+class with placeholder objects. Note that it is somewhat inefficient to
+implement a set or multiset using a B+ tree: a plain B tree would hold no extra
+copies of the keys.
+--- Problem with Separated Key/Data Arrays ---
+The most noteworthy difference to the default red-black implementation of
+std::map is that the B+ tree does not hold key and data pair together in
+memory. Instead each B+ tree node has two separate arrays of keys and data
+values. This design was chosen to utilize cache-line effects while scanning the
+key array.
+However it also directly generates many problems in implementing the iterators'
+operators which return references or pointers to value_type composition
+pairs. These data/key pairs however are not stored together and thus a
+temporary copy must be constructed. This copy should not be written as it is
+not stored back into the B+ tree. This effectively prohibits use of many STL
+algorithms writing to the B+ tree's iterators.
+--- Test Suite ---
+The B+ tree distribution contains an extensive testsuite using
+cppunit. According to gcov 89.23% of the btree.h implementation is covered.
+--- STL Incompatibilities ---
+Most important are the non-writable operator* and operator-> of the
+iterator. See above for a discussion of the problem on separated key/data
+Instead of *iter and iter-> use the new functions iter.key() and
+which return writable references to the key and data values.
+The B+ tree supports only two erase functions:
+size_type erase(const key_type &key); // erase all data pairs matching key
+bool erase_one(const key_type &key); // erase one data pair matching key
+The following STL-required functions are not supported:
+void erase(iterator iter);
+void erase(iterator first, iterator last);
+--- Extensions ---
+Beyond the usual STL interface the B+ tree classes support some extra goodies.
+// Output the tree in a pseudo-hierarchical text dump to std::cout. This
+// function requires that BTREE_DEBUG is defined prior to including the btree
+// headers. Furthermore the key and data types must be std::ostream printable.
+void print() const;
+// Run extensive checks of the tree invariants. If a corruption in found the
+// program will abort via assert(). See below on enabling auto-verification.
+void verify() const;
+// Serialize and restore the B+ tree nodes and data into/from a binary image
+// outputted to the ostream. This requires that the key and data types are
+// integral and contain no outside pointers or references.
+void dump(std::ostream &os) const;
+bool restore(std::istream &is);
+--- B+ Tree Traits ---
+All tree template classes take a template parameter structure which holds
+important options of the implementation. The following structure shows which
+static variables specify the options and the corresponding defaults:
+struct btree_default_map_traits
+ /// If true, the tree will self verify it's invariants after each insert()
+ /// or erase(). The header must have been compiled with BTREE_DEBUG
+ /// defined.
+ static const bool selfverify = false;
+ /// If true, the tree will print out debug information and a tree dump
+ /// during insert() or erase() operation. The header must have been
+ /// compiled with BTREE_DEBUG defined and key_type must be std::ostream
+ /// printable.
+ static const bool debug = false;
+ /// Number of slots in each leaf of the tree. Estimated so that each node
+ /// has a size of about 128 bytes.
+ static const int leafslots =
+ MAX( 8, 128 / (sizeof(_Key) + sizeof(_Data)) );
+ /// Number of slots in each inner node of the tree. Estimated so that each
+ /// node has a size of about 128 bytes.
+ static const int innerslots =
+ MAX( 8, 128 / (sizeof(_Key) + sizeof(void*)) );

0 comments on commit 66a03e2

Please sign in to comment.