Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.12.0 #196

Merged
merged 79 commits into from Dec 2, 2021
Merged

Release 1.12.0 #196

merged 79 commits into from Dec 2, 2021

Conversation

Morwenn
Copy link
Owner

@Morwenn Morwenn commented Dec 2, 2021

No description provided.

The old algorithm was an unsightly O(n² log n) algorithm, and a really
cunning on at that since it returned in the blink of an eye (generally
even faster than that) for most inputs, and only hit its worst case for
a few inputs.

The new algorithm is a O(n) algorithm described by T. Altman and Y.
Igarashi in *Roughly Sorting: Sequential And Parallel Approach*. It runs
in O(n) space and doesn't run as fast for most inputs, but it at least
doesn't hide as many surprises.

The old algorithm is still used when not enough memory is available, to
ensure that the next 1.x.y release won't be a breaking change, but it
will be removed in cpp-sort 2.0.0.
Only the O(n) algorithm works with bidirectional iterators, which means
that the fallback to the O(n² log n) algorithm only works with
random-access iterators.
Instead of computing all the values of DM and extracting the biggest
one, we compute them on the fly and compare them to the current biggest
element to avoid having to store O(n) values for DM.
This in turns makes the probe::par fallback algorithm run in O(n log n)
time and O(1) space instead of O(n² log n) time. That change makes the
fallback algorithm worth keeping no matter what since O(n log n) is
reasonable enough a suprise - the deprecation warning has been removed
accordingly.

Thanks a lot to Control from The Studio for showing me that it was
actually easy enough to make is_p_sorted O(n) and for giving me the
necessary insights to make it so.
is_p_sorted and the O(n log n) algorithm relying on it were changed to
work with forward iterators. probe::par now works with forward
iterators, but can only use the O(n) algorithm for bidirectional
iterators and their refinements.
In *Right invariant metrics and measures of presortedness*,
Estivill-Castro, Mannila and Wood mention that Par(X)=Dis(X) for all X.
After checking, it appears that probe::par and probe::dis do always
return the same results despite having been implemented differently to
match different definitions.

Estivill-Castro and Wood then stop using Par(X) and use Dis(X)
consistently in all their subsequent works, so it makes sense to drop
probe::par and only keep probe::dis in cpp-sort. This commit deprecates
probe::par - which will be removed in a future breaking release -, and
changes probe::dis to use the current implementation of probe::par,
which has a better time complexity.

Thanks a lot to Control from The Studio, who first suggested that Dis
and Par were the same measure of presortedness, and pushed me to look
into it.
The array wasn't needed because the cumulative min can be computed oon
the fly during the computation of DM. This new version computes LR then
goes straight to a fused RL/DM algorithm without extra space.

The new algorithm was more than twice as fast than the old one in
benchmarks.
The value of Osc(X) is obtained by computing j - i, and can only grow
through the course of the algorithm. Since i always decreases, if it
becomes smaller or equal to the current value of Osc(X), then it means
that we're done because we can't find a bigger Osc(X) than the current
value.
The implementation called std::min and std::max on projected values
without passing the comparators to them, leading compile-time errors
or runtime errors depending on whether the results of the projected
values were comparable, and whether it compared them correctly.
Thanks to Control from The Studio for the algorithm. The old one is kept
for now to avoid breaking the case where not enough memory is available,
but it is deprecated and will be removed in version 2.0.0.
The new implementation reduces the number of class template
instantiations to 2 and reduces a bit the size of the binary in debug
mode.
This would produce different template instantiations for every variation
of fixed_buffer and dynamic_buffer, while they all provide a pointer and
a size, which is all Wiki::sort cares about. Hoisting the buffer
instantiation to the top-level wiki_sort function cuts the number of
instantiations, which notably shrinks the size of the test suite by 2%.
The test suite used to instantiate a thread_local PRNG for every
instantiation of dist::shuffled::operator() and shuffled_16_values. This
commit introduces a single thread_local PRNG common to all distrubtions,
which makes the whole TLS segment of the test suite executable
disappear.
Replace it with a constexpr variable inside the function: in cpp-sort
this parameter is always fully dependent on the Compare and Projection
parameters.
When 'param' is a by-value function parameter, 'return param;'
automatically moves, but not something like 'return param += 1;'. This
commit avoids the inhibition of the automatic move by only using the
name of the parameter in the return expression.
Only the declaration of scoped_memory_exhaustion and its member
functions are needed in the header, put everything in the .cpp. This
also seems to fix what is probably another thread_local issue with
MinGW-w64 10.3.
Reorganize, simplify, add a graph to sum up the relations between the
different components (and the DOT source code to draw it).
That type trait replaces the is_std_forward_list and is_std_list traits
that were previously used. It is a much more generic and scalable
solution to the problem.
stable_t<T> used T directly when it was a stable_adapter specialization
instead of trying to use T::type when such a member type exists, which
could inhibit some specializations. This change improves the stable_t in
the presence of accidental stable_adapter nesting, which is more and
more common as stable_t becomes the main consumer interface.

This commit also updates the documentation accordingly, and splits the
graph explaining stable adapters into two: one explaining how the
stable_adapter works, and another one explaining what stable_t aliases.
This is mostly to harden the library and future dependencies against
forgetting to add is_always_stable to every specialization. We keep the
typedef in every specialization we ship to make sure that it's
impossible to use incorrectly, but this specialization of sorter_traits
makes it harder to get things accidentally wrong.
Instead of having a table, just recompute the numbers as needed, which
is an inexpensive as two shifts and a subtraction per number. This makes
no noticeable speed difference, scales more than the table solution, and
reduces the size of the produce binaries.
merge_insertion_sort and slab_sort failed when sorting an empty
collection for the same reason: they tried to create an empty
fixed_size_list, which actually causes all knids of issues. The two
sorters don't attempt to creat such a list anymore, and an assertion was
added to the fixed_size_list constructor to ensure that it never
happens.
I still don't understand what's wrong with the namespace-scope extern
thread_local variables, but even though I wiggled through issues to make
it work on other platforms, it doesn't work at all on MSVC. This commit
uses a function instead to create a thread_local variable in a way
similar to a Meyers singleton.
It was a remnant of old versions of Catch where passing an expression
with commas outside of inner parentheses was not possible, which
typically happened with explicit template parameters.

The only ones left are due to other reason, such as Catch2 not wanting
to decompose some expressions, the typical suspect being ||.
The library itself has compiled without warnings with /W2 for a long
time, but the test suite was litered with conversion warnings that
weren't errors but required some design tweaks in distributions. This
commit solves those issues and finally makes the warnings disappear.
- Use xoshiro256** instead of mt19937_64
- Move random.h from the library to the test suite
- Use the same two thread_local engines everywhere
- Put thread_local engines in random.h (not in distributions)
- Move thread_local engine definitions to random.cpp

All those changes should hopefully simplify the handling of random
numbers in the test suite, and possibly fix some of the -Winline
warnings I constantly got with GCC, which I suspect were due either to
the thread_local variables or to the size of the mt19937_64 engines.
@Morwenn Morwenn merged commit f081ed4 into master Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant