STL 2.0

Dietmar Kühl edited this page Jun 3, 2015 · 17 revisions
Clone this wiki locally

STL 2.0

STL is a great library which demonstrated an entirely new approach to programming. For simple uses it tends to work very well but when things become any more complex it quickly becomes a lot harder to use STL. This article identifies problematic areas within STL and proposes approaches to solve them. It also describes how to overcome implementation problems with the proposed solutions.


STL is thought of as different things depending on whom you ask. For this discussion STL is a system of concepts and a library of generic algorithms implemented in terms of these concepts. Normally, the term STL also refers to a library of iterator, function object, and container classes and class templates based on the original document which became a proposal for the C++ standard library. The iterators, function objects, and containers are not the focus of this discussion. This article primarily discusses problems resulting from the choice of concepts and how to improve the concepts to remove the problems. In this context the use of STL refers to the application of algorithms from the STL library.

When using STL on simple sequences it generally works very well. One perceived short-coming noticed immediately is that applying an STL algorithm to an entire container is relatively clumsy because it is necessary to obtain iterators to the begin and the end of the container. Thus, an often raised request is to get algorithms supporting containers or, more generally, supporting for ranges. Although support for ranges sounds fairly straight forward, it turns out to be not that simple: Algorithms operating on ranges are generally expected to also yield ranges but often it isn't clear which of the possible ranges is desired. The apporach for adding range support consists of two parts: Ranges are required to provide begin() and end() functions yielding suitable iterators and algorithms support some form of indication on what the result should be to support construction of suitable ranges.

Several problems are introduced from the choice of using one operation to access the data associated with a position. With the current STL data associated with a position is accessed using operator*() or operations defined in terms of operator*() possibly combined with other operation, i.e., operator->() and operator[](). An immediately problem is that the access can't distinguish whether the algorithm tries to read or write a value at a position making it hard to deal with sequences not simply by values in memory. Using only one operation requires different iterator types to deal with immutable and mutable versions of a sequence leading to const_iterator and iterator types associated with many containers and the tricky interactions between these. Folding the operation to obtain a value at a position with the iterator also means that it requires different iterator types to access different fields of a value, e.g., the key and the value part of a sequence obtained from an associative container. The approach to address all of these problems consists of splitting the abstraction used to iterate over a sequence into two components: a cursor for the positioning and a property map to access a specific property at a given position.

Although the STL algorithms are meant to operate efficiently on all sort of sequences, this isn't really the case. Two examples of sequences lacking efficient support are C-style strings and actual input sequences like a sequence of values read from an external source. In both case, the problem is providing a suitable indicator of the end of the sequence. For neither of these examples the end is really known before reaching it and a typical approach to address this is to implement the comparison of iterators as a two stage process: first determine if each of the iterators is an indicator for the end and then to compare the flags thus obtained. If it were statically known that one of the iterators is an indicator for the end it would suffice to look at the actively moving iterator. The approach to deal with this is to allow different types for the begin and the end of a sequence for single-pass and forward sequences.

There is another issue related to the end of sequences: Several algorithms take multiple sequences as argument but only for the first sequence an end iterator is specified. This opens an easy opportunity for out of sequence accesses and prevents an easy apporoach to limit sizes of ranges. The approach to deal with this is require end iterators for all sequences, not just for the first sequence.

The discussion below gives more details and examples for the problems mentioned. It also gives a thorough discussion on what is changed and discusses the impact of these changes.

When updating a library interface of a library an important choice is whether the new interface should be source or even binary compatible to the existing interface. For this discussion it is assumed that source compatibility to the existing interface isn't essential and the resulting algorithm interfaces will deviate in a small but significant way (end iterators are always required). To avoid confusion between the current STL algorithms and the new algorithms and to provide a migration path to the use of the new algorithms the new algorithms can be defined a separate namespace. In this article the namespace kuhl is used for the new algorithms (I could have used my name kühl or its correct transcription kuehl but I doubt many people can type the former easily or pronounce the latter in a reasonable way).

There is an obvious migration path and it may be possible to avoid the source incompatibilities. What is more important is that classes using the current concepts to interact with the algorithms can be supported by the new algorithms to obtain the currently provided functionality. To use additional features introduced by the changes by the concepts it may be necessary to provide additional facilities, primarily it is sometimes useful to provide some traits.

Although many of the features can be implemented with C++ 2003, the implementation would be a lot bigger and also require the definition of additional traits. Thus, this article assumes that C++2011 features can be used. The necessary features are primarily variadic templates, decltype(), and new-style function declarations. In some cases there may be a performance impact if r-value references are not supported.

There are several areas of changes to the algorithms which are not explored, yet, in this article:

  1. When ranges are passed to and from algorithms the algorithms can be chained and it may be reasonable to only lazily evaluate the algorithms. Similar to expression templates objects would be created holding the details of the algorithms and the ranges they operate on.
  2. For involved algorithms operating on larger sequences it may be reasonable to process the data concurrently. For concurrent processing the data needs to be split to into suitable blocks. Intel Threading Building Blocks includes work on concurrent algorithms.
  3. When operating on large sequences it may be reasonable to use parallel approaches employing GPU hardware.


This section gives more details on the problems and how to address these problems. It discusses each problem giving code examples for the current implementation as well as for how the solution could look like with an implementation using modified concepts. The implementations using the new concepts aren't given in this section because the details of the corresponding concepts are not, yet, discussed. However, the implementation of all algorithms used in the examples is given later in the article.


For the missing support of ranges both the problem and the idea to its solution are easily described. The problem is that the STL algorithms ask for begin and end iterators and ranges like containers already know these. Thus, it would be desirable to not pass the pair of iterators but rather the container to the algorithm and have the algorithm determine the iterators itself. For example, if a container c should be searched for a value v the current code would look something like this:

auto it = std::find(c.begin(), c.end(), v);

It is desirable to have the algorithm obtain the iterators itself for various reasons. The immediate reason is that it is less typing, of course. However, if the container is obtained from sort of non-trivial expression it can be quite annoying to use the expression twice and if the expression actually requires some sort of non-trivial expression, e.g., a look-up in an associative container, using the expression twice may even be a performance problem which would require to first bind the container to a reference and then search it. Instead, the above statement could be rewritten to become

auto it = kuhl::find(c, v);

Awesome! All the algorithm does is to call kuhl::begin(c) and kuhl::end(c) which itself call c.begin() and c.end() but can also deal with built-in arrays. Passing the results from these two functions to another version of kuhl::find() nicely deals with ranges.

Unfortunately, this is only a partial solution to the problem because it immediately opens up a few new problems which need to be dealt with:

  1. The result is an iterator rather than a range. Thus, algorithms could not be chained easily. It would be nice if the algorithm could also return a range. When it should return a range the question becomes which range to return? An argument can be made that it should be the range from the found element to the end because this is the only meaningful range in case of a sequence which can only be traversed once. However, when dealing with forward iterators it may be desirable to get the range from the beginning to the found element. The approach to deal with this is to allow the user to specify what the function should return in some way.
  2. When a temporary object is passed as range to kuhl::find() yielding an iterator would be rather meaningless because the iterator has only meaning when the object is also accessible. Even if it wasn't intentional to pass a temporary to the function it is a very easy mistake to make. With C++ 2011 it is possible to detect that an r-value was used and catch certain errors.
  3. The STL algorithms normally take their parameters by value. However, copying a container when passing it as argument of an algorithm is probably wrong in most cases. This could lead to the problem of having to decide how each algorithm takes its arguments. Again detecting how the argument was actually passed by deducing the corresponding template argument using template r-value notation seems to be the solution the problem.

Dealing with different results to be returned from the same function sounds awkward but is actually fairly straight forward and provides quite a bit of interesting flexibility. The key idea is to define a concept for a "result builder" which is a type with a static function taking multiple iterators and building whatever result is appropriate. For the kuhl::find() example the result builder takes three iterators (the begin and end of the range and an iterator to the found object) and returns an object suitable for the desired result. Here are a few examples:

auto it    = kuhl::find<kuhl::result_iterator>(c, v);  // the default
auto start = kuhl::find<kuhl::result_start_range>(c, v);
auto tail  = kuhl::find<kuhl::result_tail_range>(c, v);
auto copy  = kuhl::find<kuhl::result_tail_copy<>>(c, v);

The first example is equivalent to the default behavior and just returns an iterator to the found location. The second and third example return an object storing iterators to the respective range and providing the interface expected by a suitable range concept, i.e., primarily begin() and end() functions. The last example demonstrates the flexibility of this approach: it actually yields a copy of the elements from the found element to the end of the argument range. However, the support for specifying the desired result is limited to static types. If more flexibility is needed, e.g., to construct a container for the result using a non-default allocator, some other approach needs to be used. However, creating a copy of the tail elements demonstrates a case where passing a temporary container to kuhl::find() is actually acceptable!

When the range readily exists, it is obvious that it is just used. How about creating a range on output? For example, what happens when using a container as the destination of a call to copy()?

kuhl::copy(from, to);

The answer is that that is similar to providing the begin() iterator of container to the algorithm, i.e., the values get overwritten. One difference is that modified algorithms will also provide an end() such that the copy stops when there are no more elements. If a different behavior than overwriting existing elements is desired this needs to specified using, e.g., a suitable range:

kuhl::copy(from, kuhl::insert_back_range(to));
kuhl::copy(from, kuhl::insert_front_range(to));
kuhl::copy(from, kuhl::insert_range(to, it));

At this point it is worth noting that support for ranges in STL algorithms make the use of algorithms more convenient. However, it doesn't add any flexibility from the algorithms which isn't present with the current interface. However, the added convenience is an important aspect and will be taken into account.

Multiple Properties

The STL concepts assume that there is one value at a given position. The one value is what operator*() or its relatives operator->() and operator[]() access. There are, however, many sequences which have multiple values at each position. A canonical example taken from the standard C++ library are associative containers: each position has key and value. The STL approach is to bundle them together as a std::pair<Key const, Value> to turn the two entities into one value. This doesn't address accessing individual parts of the value, however. To get a sequence of all keys in an associative container it is necessary to encapsulate the iterator with one mapping only to the first element of the `std::pair. The approach to address this is to optionally tell for each sequence how a value is to be accessed, e.g.:

kuhl::copy(m.begin(), m.end(), std::ostream_iterator<Pair>(std::cout, " "));        // print both
kuhl::copy(_1st, m.begin(), m.end(), std::ostream_iterator<Key>(std::cout, " "));   // print keys
kuhl::copy(_2nd, m.begin(), m.end(), std::ostream_iterator<Value>(std::cout, " ")); // print values

In the above example _1st and _2nd are property maps accessing the first and the second member, respectively, at the iterator's position. The property maps are passed using values because in general it is desirable to associate state with them although this isn't necessary when accessing a specific member.

The above uses cases could be dealt with suitable iterator adapters dealing with the necessary transformation. A use cases where this becomes a lot more messy is using a std::vector<std::pair<Key, Value>> together with sort() and lower_bound() to implement a [constant] map. The sort() cannot use an adapted iterator because only incomplete objects would be moved. Thus, it would be necessary to use a comparison object for the sort() using only the first element of a std::pair<Key, Value>. On the other hand, this comparison object can't be used for lower_bound() because only the Key is available. Thus, for lower_bound() the iterators need to be adapted and for the actual result the original iterator needs to be restored. Using property maps instead makes this nice and clean:

std::vector<std::pair<Key, Value>> map;
// fill the map
kuhl::sort(_1st, kuhl::identity_pm(), map.begin(), map.end());
auto it = kuhl::lower_bound(_1st, map.begin(), map.end(), key);

Both the calls to kuhl::sort() and kuhl::lower_bound() use the _1st property map to access the first member of the values in the map. The kuhl::sort() algorithm actually accesses two different entities, though: One to obtain the key at the location and one which may need to be swapped. In the STL these two objects have to be the same but there is no conceptual necessity for this to be the case. Thus, in addition to accessing the key using the _1st property map references to the entire pair are accessed when the objects need to be swap()ed. This is done using the kuhl::identity_pm() property map which is what would have been used by default if it weren't mentioned. The example used it explicitly to show that there are actually two entities at each position involved.

Iterator vs. Const Iterator

The containers provided by the standard C++ library generally provide two kinds of iterators: iterator for mutable objects and const_iterator for immutable objects. This requires that quite a number of methods accessing essentially the same thing, i.e., the underlying sequence, is duplicated: begin(), end(),rbegin()', and rend() each come in two variants, one for non-const object yielding iterator and one for const objects yielding const_iterator. Since this isn't enough, additional versions were added to yield const_iterator even for non-const objects.

Entirely redundant code isn't the only problem, though. When an object identified by a position needs to be erased it is necessary to have a version of erase() taking a const_iterator, e.g., because the location was obtained from the const_iterator returned by find() which is a const function and, thus, can only return a const_iterator! Well, we could also have a non-const version of the find() member but this doesn't address that erase(), insert(), etc. need to be able to have their positions identified using const_iterator.

If this isn't enough, note that it is also necessary to convert from iterator to const_iterator, e.g., when providing mixed types of iterators to any of the STL algorithms. Of course, the const_iterator obtained from the algorithm isn't necessarily what is desired anyway (and this problem may be addressed to some extend by support for heterogeneous positions; see below).

This points toward separating the objects used to identify a position and the object used to access a value at a given position into two separate entities. The use of property maps the entity used to access a value at a position nicely addresses this problem: There can be just one iterator type used to identify a position for each container and different property maps used to access const or non-const containers. The property maps could be obtained from the container.

Reading vs. Writing

The STL algorithms use four operations on the value at a position: a value is read, written, moved, or swapped. However, in all cases the value is accessed the same way, using one of the operators operator*() or operator[]() (the STL algorithms don't use operator->() directly), yielding an entity which can be used like a reference. This is mostly appropriate for sequences whose values are stored in memory, i.e., where an l-value exists which can be accessed using a reference. There are interesting sequences not using l-values. std::vector<bool> is a well-known example using a reference type potentially reading or setting a bit for each of its elements: Bits in a byte can't be accessed with a reference and reading and writing the values are different options. Reading or writing a sequence of database rows is another example where the values should only be written when they are changed.

When looking at different capabilities of sequences it is obvious that the traversal capabilities (e.g. forward vs. bidirectional iteration) are orthogonal to the capabilities of reading or writing a sequence. Independent on how a sequence can be traversed, the individual values can support different combinations of being readable, writable, or both as well as being movable or swappable. Bundling the access and the traversal capabilities into concepts thus creates a large number of different concepts compared to the linear combination if these capabilities are treated separately.

This is also a facility which is easily separated and dealt with using property maps. All what is necessary is that property maps support different mechanisms for reading and writing values. Things become a bit more interesting to also support dedicated mechanisms for moving and swapping: if a readable and writable property should also be movable and swappable using the corresponding read and write operations the move and swap operations need to default to something build from reading and writing.

Limited Output Sequence

Many STL algorithms take multiple sequences which are meant to be of sufficient size. However, where this is the case only the first sequence takes both a begin and an end iterator. All other sequence are specified only by a begin iterator and are expected to be at least as long as the first sequence. For example

std::copy(begin, end, to);

The sequence started by the iterator to is expected to have at least std::distance(begin, end) elements. If this isn't the case the algorithm can end up creating, e.g., a buffer overrun. The new interface always uses sequences delimited at both ends. For example, the all to copy() would look like this:

auto last = kuhl::copy(source_begin, source_end, target_begin, target_end);
assert(last.first == source_end || last.second == target_end);
// last.first  == end of the processed source sequence
// last.second == end of the processed target sequence

For many of the algorithms this requires that the result of the algorithm is changed to indicate the last position of a sequence which would otherwise have been fully consumed. For example, the call to kuhl::copy() the algorithm needs to return an iterator to both the from sequence and the to sequence because the source sequence may not have been fully consumed or the the target sequence been fully consumed.

This seems to impose the limitation that sequences have to be bounded. Unbounded sequences are supported but still require an end indication but it can be one which is never reached. An easy way to achieve this is to have the comparison between the iterator and the end indication always be false which can even be made a constant. The next issue discussed covers a more general problem and describes how to solve this.

Heterogeneous Iterators

Sequences in STL are delimited by iterators of the same type. Using the same iterator type for the begin and the end is reasonable for bidirectional and random access sequences where both the begin and the end iterator can be used to move through the sequence. It isn't that useful for sequence which can only be traversed front to back. Using a different type for the end enables some interesting new features and allows some important optimizations:

  1. For C-strings the begin is known but the end isn't known. Using the comparison for the end as a test for the iterator pointing to a null character would make C-strings readily accessible to the algorithms. At the moment it is necessary to first determine the end of the C-string using, e.g., strlen().
  2. More general, the comparison between an iterator and an end object could be used to end a forward traversal of sequences when some condition on the current element is met (which would probably including a check if the underlying sequence is ended). This is especially useful when using a single-pass sequence like input read from an external source.
  3. Many generating iterator like, e.g., std::istream_iterator<T> use a default constructed iterator to indicate that the end has been reached. This requires that some sort of indication of whether the iterator can generate more values is used when comparing elements. When comparing against an object of a type indicating the end of a sequence only one object needs to be investigated.
  4. When limiting output sequences really unlimited sequences can efficiently be limited by comparing against an object of type such that the comparison yields a constant expression with the value false.

One possible objection is that it might seem as if object used to indicate the end of a sequence need to be converted to the iterator type to produce an end iterator, e.g., to indicate that no matching value was found by find(). This is, however, not really needed! Whenever an end iterator needs to be produced it is obtained from the begin iterator and traversing the entire sequence. For some algorithms it needs a bit care to keep a suitable iterator around but it can be easily accomplished for all algorithms in the STL which yield an iterator.

Note, that using different types for the begin and the end of a sequence is limited to sequences traversed from front to back. Where bidirectional or random access iterators are needed the sequence needs to be delimited by iterators of the same type.

Segmented Sequences

Some sequences are segmented in some way. The canonical example of a segmented data structure is std::deque<...> which is an array of arrays. The inner arrays form subsequences of the entire sequence. Other segmented sequences are not as obvious but can be fairly important like, e.g., the sequence of characters in a stream which are segmented by the buffer held by a std::streambuf. Treating a segmented sequence as a normal sequence works but isn't as efficient as treating the individual sequences separately. Treatment of segmented sequences is an important optimization, e.g., for the implementation of std::deque<...> and for stream operations.

The details of implementing this optimization are relatively complex and not, yet, discussed by this article. Once there is an implementation of all STL algorithms which is then being optimized this abstraction will be described (or if someone pesters me to write it up).


There are several important changes to be made to the interface of the STL algorithms to address the problems discussed above. Here is a quick summary:

  1. All sequences are delimited by a begin and an end and algorithm results are adjusted correspondingly.
  2. The types of the begin and the end iterator don't have to be identical for forward traversed sequences.
  3. Where a sequence is passed, it is passed either using a pair of iterators or a range object.
  4. When iterators are returned the user can choose how these are returned.
  5. When specifying a sequence it can optionally be specified how the values are to be accessed by providing a property map.
  6. Access to values will be different for reading and writing.

There are three places where users have a choice: specifying sequences as a pair of iterator or as a range, choosing whether property maps are passed, and choosing how the results are returned. This provides at least four different options for specifying a sequence:

  1. A range only.
  2. A range and a property map.
  3. A pair of iterators.
  4. A pair of iterators and a property map.

If there are multiple property maps as is, e.g., the case for sort() there are more options. Many algorithms take multiple sequences and each sequence has multiple options on how this is specified. A naive approach to implement this would yield a combinatorial explosion on the number of algorithm interfaces. The implementation section below will describe how each algorithm can be implemented using just two known interfaces, delegating to a common function without any options.

Cursors and Property Maps

The concepts used by the STL algorithms to access sequences are split into traversal and property access. For the traversal a system of cursor is defined and for the property access a system of property maps is defined. Both concepts are interrelated, however: positioning alone doesn't provide much useful application nor does access to properties without traversal. The idea is that a cursor provides a key which is passed to a property map to identify the current position. From the algorithm's point of view the keys are opaque entities useful only to be passed to corresponding property maps.

The use of the terms key and map seems to imply some sort of look-up. Although this may indeed be the case for some applications, typically there is no look-up involved. Instead, the key is often a reference to some object for which the property map reads or writes an attribute. Put differently, the use of property maps used with keys obtained from cursors is as efficient as accessing the property directly. However, the separation of traversal and property access nicely decouples separate operations and make the use of algorithms both more flexible and easier.

A very similar separation as cursors and property maps is done for relational databases. In databases a cursor identifies a row in a table. For each row it is possible to access the value for each column. To get hold of a value a cursor is intersected with column. This is a reasonable model to visualize the relationship between cursors and property maps, too: Each cursor identifies a row and each property map identifies a column or a group of columns. Similar to databases, not all columns need to be actually stored but they may be derived from the position and/or from other columns of the table.

Cursor: Position

Cursors are used to deal with positions within a sequence. STL uses iterators but a different term is deliberately used to emphasize that cursors do not provide direct access to values. Instead, they provide some opaque key which is used with one or more property maps to actually yield suitable values. Examples of keys are references to an object whose values are accessed by a property map or indices used together with property maps referencing vectors with the actual values. The latter examples gives an indication on how multiple separate sequences, i.e., the vectors, could be viewed as a single sequence.

The concepts for cursors can be organized into a hierarchy similar to that used by the STL. However, there is no need to distinguish between input and output iterators because the direction of the access is part of the property map. Thus, there are four fundamental traversal concepts:

  1. Single-pass traversal is required for sequences which may get consumed by traversing them and which can't be traversed again. This could, e.g., be a sequence of random numbers or values written to an external destination. Sequences which are traversed by an algorithm just once are referred to as single-pass sequences,
  2. Multi-pass traversal is required for sequences which can be traversed in one direction but they can be traversed multiple times. A typical example for sequences supporting multi-pass traversal are singly linked lists. Sequences which are traversed by an algorithm multiple time are referred to as multi-pass sequences. Cursors supporting multi-pass traversal can also be used where single-pass traversal is required.
  3. Bidirectional traversal is required for sequences which can be traversed in both directions. A typical example for sequences supporting bidirectional traversal are doubly linked lists or trees. Sequences which are traversed by an algorithm in both directions are referred to as bidirectional sequences. Cursor supporting bidrectional traversal can also be used where multi-pass traversal is required.
  4. Random access traversal is required for sequences where it is possible to efficiently access any element within the sequence (i.e., with complexity O(1)). A typical example for sequences supporting random access are vectors. Sequences which are traversed by an algorithm using random access are referred to as random access sequences. Cursors supporting random access traversal can also be used where bidirectional traversal is required.

In addition to the traversal concepts a position concept is used: A position is a singular position (i.e., the concept doesn't provide access to a value associated with this position) which can be used as an end indicator for single-pass and multi-pass sequences. The type of the position can be identical as that of the cursor type it is used with but it can also have a separate type.

Property Map: What Data to Access

The job of a property map is fairly straight forward: Given a key provide appropriate access to a value associated with this key. Typically, this amounts to accessing a field of they key or using the key as an index into a vector. It can also involve reading a record identified by the key from a database or updating a database. There are different operations supported by property maps but different property map concepts aren't necessarily extensions of other property concepts:

  1. A readable property map can be used to read a value for a given key.
  2. A writable property map can be used to write a value for a given key.
  3. A read/write property map can used to both read and write a value for a given key. A read/write property map can be used where a readable or a writable property map is required.
  4. An l-value property map can be used to obtain a reference to a value. An l-value property map can be used where a read/write property map is required. Note that none of the algorithms requires an l-value property map. The main reason for defining this concept is to support property map adapters which may require to access individual fields of an object.
  5. A move property map can be used to move an object from one location to another location.
  6. A swap property map can be used to swap two objects.

The exact details of the property maps beyond the read/write property map aren't, yet, clear: So far, only few algorithms are implemented needing moving or swapping of objects and it isn't quite clear what is really needed for those.

Iterator: Combination of Cursor and Default Property Map

The STL iterator concepts are equivalent to a combination of one of the cursor concepts with an l-value property map. To make existing sequences easily accessible to the new style algorithms the cursor concepts will be very similar to the iterator concepts. In addition, the algorithms will allow omission of property maps in which case a default property map is used which is an l-value property map passing through the reference passed in as key: Since operator*() for iterators yields a reference this is a direct match for the concepts used by the current STL.

Proposed Concepts


  1. Position
  2. Single Pass Cursor
  3. Multi Pass Cursor
  4. Bidirectional Cursor
  5. Random Access Cursor

Property Maps

  1. Readable Property Map
  2. Writable Property Map
  3. Read/Write Property Map
  4. Lvalue Property Map
  5. Movable Property Map


Function Objects

Implementation Details

Combinatorial Explosion of Algorithm Interfaces?

Most General Implementation

Fundamental vs. Composite Algorithms


Other Work

Boost's Iterator Adaptors

Andrei Alexandrescu's Ranges