Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/algorithm.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ Thanks to all the people who have reviewed this library and made suggestions for
[include knuth_morris_pratt.qbk]
[endsect]

[section:Sorting Integer Sorting Algorithms]
[include integer_sort.qbk]
[endsect]

[section:CXX11 C++11 Algorithms]
[include all_of.qbk]
[include any_of.qbk]
Expand Down
88 changes: 88 additions & 0 deletions doc/integer_sort.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
[/ QuickBook Document version 1.5 ]

[section:integer_sort Integer Sorting Algorithms]

[/license
Copyright (C) 2014 Jeremy W. Murphy

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]


[heading Overview]
Integer sorting algorithms take advantage of the properties of integers to sort them using mechanisms other than comparison.
Counting sort algorithm literally counts the frequency of values in the input to form an intermediate representation of the data from which a stable, ordered sequence can be created.

Least-significant digit (LSD) radix sort uses counting sort to order input data iteratively. With a default one-byte digit, radix sort runs counting sort on one digit of the input at a time.


[/ Counts are stored in an array indexed by the value.
/ The partial sum of the array of counts is calculated, calculating the right-most index of each value / in the output.
/ Values are then read from the end of the input, storing each in its position calculated from the array, which is decremented at each step.
/ It is limited to sorting types that can be projected in order onto an unsigned integral type.
]

[heading Interface]
Requirements are for the input iterator to be bidirectional and for the output iterator to be random access. The basic interface requires the input type T to be of an unsigned integral type. Radix and counting sort have an almost identical interface: counting sort has one additional parameter, digit, which radix sort calculates and passes to counting sort internally.

``
template <typename Input, typename Output>
void stable_counting sort(Input first, Input last, Output result);
template <typename Input, typename Output>
void radix sort(Input first, Input last, Output result);
``

The next interface introduces customization of the conversion to allow user-defined types.
The output type of the conv function has the same requirements as T above.
``
template <typename Input, typename Output, typename Conversion>
void stable_counting sort(Input first, Input last, Output result, Conversion conv);
``
The next interface adds the option to specify min and max manually.
``
template <typename Input, typename Output, typename Conversion>
void stable_counting sort(Input first, Input last, Output result, Conversion conv, T min, T max);
``
Finally, the complete interface for total customization includes specifying the radix and digit.
``
template <typename Input, typename Output, typename Conversion>
void stable_counting sort(Input first, Input last, Output result, Conversion conv, T min, T max, unsigned radix, char unsigned digit);
``

[heading Complexity]
Let k equal the range of the input (max - min). Counting sort runs in \Theta(k) space. If k = O(n), counting sort runs in \Theta(n) time, otherwise it runs in \Theta(n + k).

If k = O(n), radix sort runs in \theta(dn) time, otherwise it runs in \Theta(d(n + k)). Even though this complexity is worse than counting sort, the performance characteristics more than make up for it in practice.

Space complexity for radix sort depends on the width of the unsigned integral type divided by the radix, called digits in the algorithm:

digits space complexity
1 \Theta(k)
2 \Theta(n + k)
>=3 \Theta(2n + k)

If digits equals one, LSD radix sort is equivalent to stable counting sort. When digits equals two, one temporary buffer is required, and for greater than two digits, two temporary buffers are required.

// To guarantee the best linear complexity...


[heading Exception Safety]
Counting and radix sort take their parameters by value and have no global state.

[heading Customization Points]
If UnsignedInteger(T) is false, a Conversion type is required to project T onto an unsigned integral type of appropriate size.


[heading Performance]
Radix sort performance is proportional to the size of T and k.

(On x86_64 ) Compared to std::sort it is approximately 20 times faster at sorting `char`,
10 times faster for `short`, 4 times faster for `int` and almost 2 times faster for `long`.


[heading Notes]
It is typical for algorithms to treat empty input (n = 0) as a special case. These algorithms also treat n = 1 as a special case of no sorting work to be done. This was largely motivated by the fact that the LSD radix sort algorithm calculates log(n) but does not expect zero,
however it makes logical sense for a sorting algorithm in general.

[endsect]
12 changes: 12 additions & 0 deletions include/boost/algorithm/integer_sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// (C) Copyright Jeremy W. Murphy 2013.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#ifndef INTEGER_SORT
#define INTEGER_SORT

#include <boost/algorithm/integer_sort/counting-sort.hpp>
#include <boost/algorithm/integer_sort/radix-sort.hpp>

#endif
177 changes: 177 additions & 0 deletions include/boost/algorithm/integer_sort/counting-sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
// (C) Copyright Jeremy W. Murphy 2013.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

/** \file counting-sort.hpp
* \brief Stable counting sort.
*/

#ifndef COUNTING_SORT
#define COUNTING_SORT

#include <iterator>
#include <numeric>
#include <cassert>
#include <limits>
#include <vector>
#include <type_traits>
#include <algorithm>
#include <cstdint>

#include <boost/concept_check.hpp>
#include <boost/concept/requires.hpp>


namespace boost {
namespace algorithm {
namespace detail {

template <typename Value, typename Shift, typename Bitmask>
inline Value count_index(Value const a, Shift const b, Value const c, Bitmask const d)
{
return ((a >> b) - c) & d;
}
}


namespace transformation
{
template <typename T>
struct identity
{
typedef T result_type;

identity() {}

T const &operator()(T const &x) const
{
return x;
}
};


// For types that are implicitly convertible to an unsigned integral type.
template <typename T>
struct implicit
{
typedef T result_type;

implicit() {}

template <typename U>
T operator()(U const &x) const
{
return x;
}
};
}


/**
* Requires that client allocates space for result beforehand.
*
* @brief Generalized stable counting-sort.
*
* \c Input Bidirectional input iterator.
* \c Output Random access output iterator.
*
* \param first Input iterator that points to the first element of the unsorted data.
* \param last Input iterator that points past the last element of the unsorted data.
* \param result Output iterator that points to the first element where the sorted data will go.
* \param conv Function object that converts the input type to an unsigned integral type.
* \param min The smallest value present in the input >> r * d.
* \param max The largest value present in the input >> r * d.
* \param r The radix or width of digit to consider.
* \param d Which digit to consider.
*/
template <typename Input, typename Output, typename Conversion>
BOOST_CONCEPT_REQUIRES(
((BidirectionalIterator<Input>))
((Mutable_RandomAccessIterator<Output>))
((UnsignedInteger<typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type>))
, (Output))
stable_counting_sort(Input first, Input last, Output result, Conversion conv,
typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type const min,
typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type const max,
unsigned const radix, unsigned char const digit)
{
typedef std::reverse_iterator<Input> ReverseIterator;

if(first != last)
{
if(std::next(first) == last)
*result++ = *first;
else
{
typedef typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type T;
assert(radix != 0);
// TODO: Maybe this next assertion should be an exception?
assert(max - min != std::numeric_limits<uintmax_t>::max()); // Because k - min + 1 == 0.
auto const shift = radix * digit;
uintmax_t const bitmask = (1ul << radix) - 1;
std::vector<uintmax_t> C(static_cast<uintmax_t>(max - min) + 1);
ReverseIterator rfirst(last);
ReverseIterator const rlast(first);

// TODO: Could this be done faster by left-shifting _min and _bitmask once instead of right-shifting the value n times?
std::for_each(first, last, [&](T const &x)
{
C[detail::count_index(conv(x), shift, min, bitmask)]++;
});

std::partial_sum(C.begin(), C.end(), C.begin());

for(; rfirst != rlast; rfirst++)
*(result + --C[detail::count_index(conv(*rfirst), shift, min, bitmask)]) = *rfirst;
}
}
return result;
}


template <typename Input, typename Output, typename Conversion>
BOOST_CONCEPT_REQUIRES(
((BidirectionalIterator<Input>))
((Mutable_RandomAccessIterator<Output>))
((UnsignedInteger<typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type>))
, (Output))
stable_counting_sort(Input first, Input last, Output result, Conversion conv,
typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type const min,
typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type const max)
{
unsigned const radix(sizeof(typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type) * 8);
return stable_counting_sort(first, last, result, conv, min, max, radix, 0);
}


template <typename Input, typename Output, typename Conversion>
BOOST_CONCEPT_REQUIRES(
((BidirectionalIterator<Input>))
((Mutable_RandomAccessIterator<Output>))
((UnsignedInteger<typename std::result_of<Conversion(typename std::iterator_traits<Input>::value_type)>::type>))
, (Output))
stable_counting_sort(Input first, Input last, Output result, Conversion conv)
{
if(first != last)
{
auto const bound(std::minmax_element(first, last));
return stable_counting_sort(first, last, result, conv, *bound.first, *bound.second);
}
else
return result;
}


template <typename Input, typename Output>
BOOST_CONCEPT_REQUIRES(
((BidirectionalIterator<Input>))
((Mutable_RandomAccessIterator<Output>))
, (Output))
stable_counting_sort(Input first, Input last, Output result)
{
return stable_counting_sort(first, last, result, transformation::identity<typename std::iterator_traits<Input>::value_type>());
}
}
}
#endif
Loading