Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate and remove qsort? #13234

Closed
p5pRT opened this issue Sep 6, 2013 · 35 comments
Closed

deprecate and remove qsort? #13234

p5pRT opened this issue Sep 6, 2013 · 35 comments
Labels

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented Sep 6, 2013

Migrated from rt.perl.org#119635 (status was 'resolved')

Searchable as RT119635$

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @nwc10

tl;dr​: Nothing current uses C<use sort 'quicksort';> - should we remove it?

The sort pragma was added by this commit in 5.7.x times​:

commit 84d4ea4
Author​: Jarkko Hietaniemi <jhi@​iki.fi>
Date​: Wed Nov 21 22​:25​:14 2001 +0000

  Implement the sort pragma. Split sort code from pp_ctl.c
  to pp_sort.c. Includes the quicksort stabilizing layer
  from John P. Linderman. -Msort=qsort or -Msort=fast is
  faster than without (or with -Msort=mergesort or -Msort=safe)
  for short random inputs, but for some reason not quite as fast
  as 5.6.1 qsort. More benchmarking, profiling, tuning, and
  optimizing definitely needed.

  p4raw-id​: //depot/perl@​13179

Part of the reason, I think, was that v5.6 and earlier's sort was a
quicksort, and hence not a stable sort.

By the time v5.8.0 shipped, the only options offered were 'stable',
'_mergesort', '_quicksort' and the alias '_qsort', with the default as
mergesort (which is stable).

qw(_qsort) gets you a quicksort, qw(_qsort stable) gets you a quicksort with
a fallback comparison to guarantee stability. (Slower, but stable)

Of the 28 distributions that mention C<use sort>, most are in documentation​:

http​://grep.cpan.me/?q=%5Cbuse%5Cs%2Bsort.*%28stable%7Cqsort%7Cquicksort%7Cmergesort%7Csafe%7Cfast%29

All the code is C<use sort 'stable'> with 3 exceptions.

1) This commented-out code in a regression test​:

  # uncomment to enable compatibility with perl versions prior to 5.8
  #use sort '_quicksort';

2) 3 examples in Module-Pragma-0.02/example/sort.t
  (which is a copy of the core sort pragma from v5.10.0)

3) 2 uses in Class​::Std​::Containers, which fails tests with v5.10.0 and later
  (and hasn't been updated to fix this)

So,

1) nothing will break if we removed _qsort.

2) sort.pm's documentation, from the start, has stated this​:

  You can force the
  choice of algorithm with this pragma, but this feels heavy-handed,
  so the subpragmas beginning with a C<_> may not persist beyond Perl 5.8.

3) We shave off some source code​:

embed.fnc | 3 +-
embed.h | 3 +-
pp_sort.c | 866 -------------------------------------------------------------
proto.h | 7 +-
4 files changed, 6 insertions(+), 873 deletions(-)

  [plus a bit in lib/sort.pm, which I didn't touch for this prototype]

4) We reduce the size of pp_sort.o by about 3K (or about 0.25% of perl)

but it's not exactly high maintenence code.

So, should we put _qsort through a deprecation cycle, and drop it in v5.23?

I'm not actually sure if it's worth it. Do the small gains outweigh the small
churn?

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @shlomif

Hi Nicholas and all,

On Fri, 06 Sep 2013 05​:42​:18 -0700
Nicholas Clark (via RT) <perlbug-followup@​perl.org> wrote​:

but it's not exactly high maintenence code.

So, should we put _qsort through a deprecation cycle, and drop it in v5.23?

I'm OK with either way. I never used the "use sort" pragma.

Regards,

  Shlomi Fish

--


Shlomi Fish http​://www.shlomifish.org/
Freecell Solver - http​://fc-solve.shlomifish.org/

<petn-randall> Writing your own nirvana may be easier than writing a good
  blog engine ;)
  — http​://xrl.us/botw86

Please reply to list if it's a mailing list post - http​://shlom.in/reply .

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @tsee

On 09/06/2013 02​:42 PM, Nicholas Clark (via RT) wrote​:

1) nothing will break if we removed _qsort.

... on CPAN.

but it's not exactly high maintenence code.

It's certainly not, nut pp_sort.c as a whole is moderately complex.
Reducing the amount of code there can only be good.

So, should we put _qsort through a deprecation cycle, and drop it in v5.23?

I'm not actually sure if it's worth it. Do the small gains outweigh the small
churn?

+1 to see it go. I'm trying not to grandly suggest that somebody (not
me, likely not Nicholas) should try implementing timsort for us now
instead of the current merge-alike. :)

--Steffen

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @jplinderman

Part of the reason, I think, was that v5.6 and earlier's sort was a
quicksort, and hence not a stable sort.

Stability was definitely a feature, but my best argument to Jarkko was that
far too many common sequences (organ pipe, sawtooth, etc.) went quadratic,
and Peter Mcilroy's merge sort did a particularly nice job on sequences
with pre-existing order (linear time on sorted input).

The only sequences (that I could think of, anyway) where it was clear that
qsort would outperform merge sort were those containing many repetitions of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k
passes would be required, where merge sort might need log N.

My email address in pp_sort.c is shown as jpl@​research.att.com. That will
stop working in a year or so. Better would be jpl.jpl@​gmail.com. Peter is
no longer at Lucent, so his email address, pmcilroy@​lucent.com, probably
stopped working about a decade ago. I don't have a current email address
for him.

I have no strong opinion about removing qsort. Seems like a good idea to
me.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From victor@vsespb.ru

2013/9/6 John P. Linderman <jpl.jpl@​gmail.com>

The only sequences (that I could think of, anyway) where it was clear that
qsort would outperform merge sort were those containing many repetitions of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k
passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure
about perl implementation).

http​://en.wikipedia.org/wiki/Merge_sort

Merge sort's most common implementation does not sort in place; therefore,
the memory size of the input must be allocated for the sorted output to be
stored in (see below for versions that need only *n*/2 extra spaces).

http​://en.wikipedia.org/wiki/Quicksort

The disadvantage of the simple version above is that it requires O(*n*)
extra storage space, which is as bad as merge
sort<http​://en.wikipedia.org/wiki/Merge_sort>.
The additional memory allocations required can also drastically impact
speed and cache performance in practical implementations. There is a more
complex version which uses an
in-place<http​://en.wikipedia.org/wiki/In-place_algorithm>partition
algorithm and can achieve the complete sort using O(log
*n*) space (not counting the input) on average (for the call
stack<http​://en.wikipedia.org/wiki/Call_stack>).
We start with a partition function​:

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @jplinderman

All true, all valid observations. The merge sort implementation uses N
additional *pointers* (not records), the qsort implementation sorts in
place (unless you want stability, in which case it, too, uses another set
of pointers, same as the merge sort implementation). I started programming
back in the 60's, when a few kilobytes of memory was all that there was,
and you'd do all sorts of things that would now be considered inexcusable
to save a few bytes (like using only the last two digits of the year,
leading to the Y2K "crisis"). I often have to remind myself that memory is
now darn near free, but my time is more expensive every year (at least
until AT&T "surplussed" me). If you need to worry about double the number
of pointers, perhaps perl is not the implementation language you should be
using. Qsort, with its divide and conquer approach, automatically ends up
doing a lot of processing in the fastest (i.e. smallest) cache, which is
quite ironic, since quicksort was invented when memory was monolithic, and
caches didn't exist. But I attempted to fix that flaw in merge sort (at
the expense of some additional stack space... it took some discipline to
ignore that in the environment I now program in rather than one where I
learned to program :-). My attitude now (since you didn't ask :-) is that
space matters a whole lot less than time, and cache hits matter, because
they have a major impact on time. And maintainability matters a lot too,
because it relates to correctness, which really matters more than just
about everything, which is why I wouldn't much mind qsort disappearing. I
hope to live long enough to experience another major paradigm shift (like
exploiting multiple processors in parallel).

On Fri, Sep 6, 2013 at 3​:34 PM, Victor Efimov <victor@​vsespb.ru> wrote​:

2013/9/6 John P. Linderman <jpl.jpl@​gmail.com>

The only sequences (that I could think of, anyway) where it was clear that

qsort would outperform merge sort were those containing many repetitions of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k
passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure
about perl implementation).

http​://en.wikipedia.org/wiki/Merge_sort

Merge sort's most common implementation does not sort in place; therefore,
the memory size of the input must be allocated for the sorted output to be
stored in (see below for versions that need only *n*/2 extra spaces).

http​://en.wikipedia.org/wiki/Quicksort

The disadvantage of the simple version above is that it requires O(*n*)
extra storage space, which is as bad as merge sort<http​://en.wikipedia.org/wiki/Merge_sort>.
The additional memory allocations required can also drastically impact
speed and cache performance in practical implementations. There is a more
complex version which uses an in-place<http​://en.wikipedia.org/wiki/In-place_algorithm>partition algorithm and can achieve the complete sort using O(log
*n*) space (not counting the input) on average (for the call stack<http​://en.wikipedia.org/wiki/Call_stack>).
We start with a partition function​:

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 6, 2013

From @ap

* John P. Linderman <jpl.jpl@​gmail.com> [2013-09-07 00​:35]​:

I often have to remind myself that memory is now darn near free,
but my time is more expensive every year

  Computer’s Perspective on Moore’s Law​:
  Humans are getting more expensive at an exponential rate.
  —Mark S. Miller

--
Aristotle Pagaltzis // <http​://plasmasturm.org/>

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 7, 2013

From victor@vsespb.ru

I was able to reproduce case when qsort do 3 times less comparison than
mergesort.

However I was unable to find case when qsort faster when comparsion
function is simple, obvious this can be the case for complex comparison
functions. One also can/should use
http​://en.wikipedia.org/wiki/Schwartzian_transform anyway if comparison
function is always slow, an this limits benefits of qsort.

==
use strict;
use warnings;

my @​a = (1..3) x 100_000;
our $i;

sub mysub
{
++$i;
$a <=> $b;
}

my ($q, $m);
{
use sort '_quicksort';
local $i=0;
my @​b = sort mysub @​a;
$q = $i;
print "quicksort​: $i\n";
};
{
local $i=0;
my @​b = sort mysub @​a;
$m = $i;
print "mergesort​: $i\n";
}

print $q/$m, "\n";
__END_
quicksort​: 599997
mergesort​: 1746356
0.343570841225958

2013/9/7 John P. Linderman <jpl.jpl@​gmail.com>

All true, all valid observations. The merge sort implementation uses N
additional *pointers* (not records), the qsort implementation sorts in
place (unless you want stability, in which case it, too, uses another set
of pointers, same as the merge sort implementation). I started programming
back in the 60's, when a few kilobytes of memory was all that there was,
and you'd do all sorts of things that would now be considered inexcusable
to save a few bytes (like using only the last two digits of the year,
leading to the Y2K "crisis"). I often have to remind myself that memory is
now darn near free, but my time is more expensive every year (at least
until AT&T "surplussed" me). If you need to worry about double the number
of pointers, perhaps perl is not the implementation language you should be
using. Qsort, with its divide and conquer approach, automatically ends up
doing a lot of processing in the fastest (i.e. smallest) cache, which is
quite ironic, since quicksort was invented when memory was monolithic, and
caches didn't exist. But I attempted to fix that flaw in merge sort (at
the expense of some additional stack space... it took some discipline to
ignore that in the environment I now program in rather than one where I
learned to program :-). My attitude now (since you didn't ask :-) is that
space matters a whole lot less than time, and cache hits matter, because
they have a major impact on time. And maintainability matters a lot too,
because it relates to correctness, which really matters more than just
about everything, which is why I wouldn't much mind qsort disappearing. I
hope to live long enough to experience another major paradigm shift (like
exploiting multiple processors in parallel).

On Fri, Sep 6, 2013 at 3​:34 PM, Victor Efimov <victor@​vsespb.ru> wrote​:

2013/9/6 John P. Linderman <jpl.jpl@​gmail.com>

The only sequences (that I could think of, anyway) where it was clear

that qsort would outperform merge sort were those containing many
repetitions of a small number (k) of keys. Qsort's "fat pivot" guaranteed
that only k passes would be required, where merge sort might need log N.

Seems memory usage can be better with qsort than with mergesort (not sure
about perl implementation).

http​://en.wikipedia.org/wiki/Merge_sort

Merge sort's most common implementation does not sort in place;
therefore, the memory size of the input must be allocated for the sorted
output to be stored in (see below for versions that need only *n*/2
extra spaces).

http​://en.wikipedia.org/wiki/Quicksort

The disadvantage of the simple version above is that it requires O(*n*)
extra storage space, which is as bad as merge sort<http​://en.wikipedia.org/wiki/Merge_sort>.
The additional memory allocations required can also drastically impact
speed and cache performance in practical implementations. There is a more
complex version which uses an in-place<http​://en.wikipedia.org/wiki/In-place_algorithm>partition algorithm and can achieve the complete sort using O(log
*n*) space (not counting the input) on average (for the call stack<http​://en.wikipedia.org/wiki/Call_stack>).
We start with a partition function​:

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 9, 2013

From @jplinderman

On Fri, 06 Sep 2013 18​:54​:12, Steffen Mueller <smueller@​cpan.org> wrote​:

+1 to see it go. I'm trying not to grandly suggest
that somebody (not me, likely not Nicholas) should try
implementing timsort for us now instead of the current
merge-alike. :)

I hope it's clear who copied whom. If not, read on.

Anything indented following a "> " is text extracted from

http​://svn.python.org/projects/python/trunk/Objects/listsort.txt

Tim Peters' description of "timsort". Let's clear the air right away.
I'm not a fan of Tim Peters, mostly because he acknowledges that one of
the important ideas used in "timsort" originated in a paper by
Peter Mcilroy​:

I first learned about the galloping strategy in a related context; see​:

"Adaptive Set Intersections\, Unions\, and Differences" \(2000\)
Erik D\. Demaine\, Alejandro López\-Ortiz\, J\. Ian Munro

and its followup(s). An earlier paper called the same strategy
"exponential search"​:

"Optimistic Sorting and Information Theoretic Complexity"
Peter McIlroy
SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp
467-474, Austin, Texas, 25-27 January 1993.

and it probably dates back to an earlier paper by Bentley and Yao. The
McIlroy paper in particular has good analysis of a mergesort that's
probably strongly related to this one in its galloping strategy.

In http​://svn.python.org/projects/python/trunk/Objects/listobject.c,
the closest thing I can find to the official source for "timsort",
Peter's name is never mentioned. Nor does it appear in Wikipedia's
entry for "timsort". That may or may not be Tim's fault.

It turns out that Perl is moving to a stable mergesort

Seems that Tim was aware of the sort code Jarkko and I were kicking about.
I never attempted to hide the fact that all the good ideas came from
Peter's code. Indeed, we came within a gnat's eyelash of rejecting
my implementation of Peter's algorithm because of licensing issues.

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-09/msg00027.html
http​://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/perl5-porters/2000-09/msg00207.html?56#mfs

Anyway, we have, at the very start

Intro
-----
This describes an adaptive, stable, natural mergesort, modestly called
timsort (hey, I earned it <wink>). It has supernatural performance on
many
kinds of partially ordered arrays (less than lg(N!) comparisons needed,
and
as few as N-1)

<wink> doesn't cut the mustard. He borrowed some of the best parts
of timsort from Peter's paper, and, to my taste, doesn't adequately
acknowledge that. OK, air cleared. Let's move on, with the
understanding that I am not totally objective.

Since "Peter" can too easily be confused with "Peters", I'll refer to
the sorts as Tim's and Peter's, with the understanding that "Peter"
refers to Peter Mcilroy.

I'm unaware of any algorithms whose performance is "supernatural".
The best algorithms I know of are the product of careful analysis.
Peter did that in the the Soda paper, which, I admit, I mis-cited
in the perl sort code, but is accurately cited above. He further
enhanced the algorithms, and shared them with me, and Doug Mcilroy,
at Bell Labs. I contributed only modestly to Peter's ideas (with
suggestions like, "stability is really worthwhile, and here's how
we can achieve it efficiently".) One of the improvements Peter made,
which is in the perl source code, but not the Soda paper, was how to
detect "natural runs" without wasting a lot of comparisons. That
deserves a paragraph or so of its own, because it differs from the
way that (I understand) "timsort" to work. I get my understanding
of how "timsort" works from

http​://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17

(which I believe to have a few bugs, but is at least comprehensible
to someone fluent in C, but not in python.) The source I cited above
is too full of python-stuff to be worth my effort to grok.

So let's start with some simple combinatorics, and let's assume that
no element to be sorted is identical with something else to be sorted,
since that simplifies the analysis without doing too much damage to
the truth. With a sequence of two (distinct) elements,
we have two possibilities. (Forgive me if this is dead obvious, I'd like
to make it clear to people to whom it is not dead obvious.)

1,2 and
2,1

in which case we either have a strictly increasing sequence or a
strictly decreasing sequence (where we have chosen two integers
whose relative values to stand in for the way in which two distinct
objects compare to each other). So any two distinct values are
either in strictly increasing or strictly decreasing order.

Add another (distinct) value, 3, to our two original values and we have

1,2,3 started increasing, continued to be
1,3,2 started increasing, then reversed
2,1,3 started decreasing, then reversed
2,3,1 started increasing, then reversed
3,1,2 started decreasing, then reversed
3,2,1 started decreasing, continued to be

In random (distinct) data, there's only 1 chance in 3 that looking for
an extension of the prevailing run will pay off. Not so good. But it
gets worse. Of the sequences that have initial runs of 3

1,2,3,4 started increasing, continued to be
1,2,4,3 started increasing, then reversed
1,3,4,2 started increasing, then reversed
2,3,4,1 started increasing, then reversed
3,2,1,4 started decreasing, then reversed
4,2,1,3 started decreasing, then reversed
4,3,1,2 started decreasing, then reversed
4,3,2,1 started decreasing, continued to be

Now there's only 2 chances in 8 that the comparison will extend the
run. Or, of the N! ways N distinct elements can appear in a sequence,
only two will yield a run of N. Unless you have some reason to
believe the data are "orderly", looking for runs longer than 2 is
a sucker bet. Tim knows that

... timsort knows darned well it's doing compares that
won't pay on random data ...

But, of course, if you want to detect runs longer than 2, you must
risk some comparisons that aren't going to pan out. That's what
the "Optimistic" part of Peter's paper is about. But the devil is
in the details. Timsort appears to bust large sequences into shorter
ones of length between 32 and 64, ostensibly to better balance the
merge phase. Then it always tries to extend the initial run of 2,
in ascending or descending order, to include as many additional
elements as possible. If it extends beyond the fixed-length sequence,
it keeps on going, possibly extending to the very end of the sequence
to be sorted. If the run ends before the fixed-length segment is
exhausted, timsort reverses the run if it was strictly decreasing,
then uses a binary insertion sort to put the entire fixed-length
sequence in sorted order. Two problems. As we observed (and Tim
knows), runs as long as 32 to 64 are, a priori, spectacularly
improbable. Unless there's a lot of structure to the input sequence,
the setup stage is going to end up "wasting" a comparison that
doesn't extend a run, then doing an insertion sort on what's
left over. The binary insertion sort holds the number of
comparisons to an acceptable level, but the number of moves needed
to insert additional elements into the sorted run can get ugly.
Second, and more important, runs are detected only if they
start at the beginning of a fixed-length segment. A nice long
run might start somewhere else in the segment, but it will be
overlooked. In fact, if you want to lay a world of hurt on timsort,
figure out a number of inputs that will lead to fixed length
segments of size, say, 64, take a strictly decreasing sequence
of length 64, move the last element to the front, and fill up
the input with repetitions of this sequence. Each fixed length
sequence will start with an increasing sequence of length 2,
but the next (wasted) comparison will encounter a reversal,
so the decreasing sequence of length 62 will be insertion sorted.
Insertion sort on decreasing elements is worst case for moves.
Each element after the first one (which was, by construction,
the minimal element) will have to moved each time a new element
is added. OK, 1800-some moves per segment isn't going to take
terribly long, but if the idea behind fixed length segments was
to reduce moves in the merge phase, you have started out in a large
hole.

Peter's sort addressed both of these problems. (The run setup
phase was not part of his Soda paper, but it has been in pp_sort.c
from the beginning, because he came up with it while he was working
at Bell Labs, and sharing the source with me.) Rather than trying
to extend every run of length 2, we look at the sense of PTHRESH
(defaulting to 8) pairs in a row. If they are all the same, either
increasing or strictly decreasing, we have fairly compelling evidence
of a run. Only then do we "risk" a comparison to see if adjacent
runs can be "bridged" into a longer one. If all PTHRESH pairs turn
out to form a run, we have a run of 16, a fine sign of orderly data.
An attempt is made to extend the run, one element at a time, until
the sense of a comparison is no longer compatible with extending the
run (or until we have reached the end of the input). If we discover
a reversal, we keep whatever runs we got by bridging pairs, and push
ahead, inspecting the sense of adjacent pairs. PTHRESH is a measure
of "risk aversion" about "wasting" a comparison. Higher values
demand more evidence of a possible long run before taking that risk,
but also overlook any runs shorter than 2*PTHRESH. The merge phase,
as described in the SODA paper (and as implemented by "galloping"
in timsort) benefits from having adjacent runs in nearly sorted order,
(although they do not benefit as much as if the runs had previously
been combined into a single run). So a conservative PTHRESH seems
appropriate. I like to think about PTHRESH this way​: in random data,
a pair is equally likely to be in increasing or strictly decreasing
order. PTHRESH pairs in a row is like flipping heads (or tails)
PTHRESH times in a row. That can happen with a fair coin, and will
happen about once every 2**(PTHRESH-1) times, but even if we lose,
it cannot happen often enough to push us far from the theoretical
limit on comparisons used. Note, too, that runs of length 2*PTHRESH
or more need not start on any particular boundary. The same sequence
that gives timsort heartburn presents no problems to Peter's sort.
We get a run of length 2 at the start, then discover the run of
length 63 that follows it (lapping into timsort's next fixed length
segment), then a bunch of runs of length 64 (each lapping into the
next timsort segment), ending up with a run of length 63. We've gotten
just one more sorted segment to deal with (the first one), and we got
them with many fewer comparisons and moves. Of course, one shouldn't
use a single pathological input sequence to claim universal superiority,
but Peter's ability to find runs anywhere looks like a winner, unless
balanced merges can be shown to matter more. I await non-supernatural
evidence.

I won't go into much detail about the merge phase, which is really the
essence of Peter's SODA paper. When merging two runs, the basic idea
is to determine which run has the greater element. We then find *all*
the elements in the other run for which it is also greater, and add them
to the merged run. Now we have found a greater element in the opposite
run, and, as before, we use it as a "wedge" in the other array to find
*all* the elements in the other array for which it is greater. And so on.
The obvious way to "position the wedge" is to compare it to each element
in turn, and stop when we reach a greater one. If the input is random,
long runs are improbable, much as long runs of adjacent pairs are
improbable in the original setup. So if we haven't found a greater
element after several comparisons, we have evidence of order, and we
can justify skipping an element. If we find the following element
is still not greater, we know the element we skipped couldn't have been
greater, either. So we skip 2 elements, then 4, then 8, and so on,
until we either find a greater element, or we reach the end of the list.
We refer to this as "ramping up" in pp_sort.c. If we encounter a greater
element before reaching the end of the list, we need to look back for the
first such element. We go back half the size of the last step, then
half that size, until we encounter an element that is not greater.
We call this "ramping down". Then we turn around again and look in
the other direction, reducing the step size by half, until we finally
home in on the least element where we compared greater. Peter refers
to this in the SODA paper as "insertion sort with differential search".
Timsort refers to it as "galloping". As with taking a chance on extending
a run in the initial setup, this sometimes "wastes" a comparison.
For example, if we skip an element, and find we are greater than the
next element, we have to check the element we skipped. If we are also
greater than that element, we would have found it immediately if
we had just considered every element. So we don't want to start
ramping up too soon. In Peter's sort, this is parameter RTHRESH,
defaulting to 6. We check up to RTHRESH elements in a row before
ramping up. That should avoid wasted compares on random data, but
go sub-linear on ordered data. This is a fixed threshold in Peter's
sort, an adjustable value in timsort. As far as I can tell, this is
a religious issue. Should previous behavior be predictive of future
behavior? If you believe yes, you should adjust the threshold up or
down. If you think no, just leave it fixed. Tim seems to agree it
doesn't matter much. Pay particular attention to the last sentence.

We can't guess in advance when it's going to win, though,
so we do one pair at a time until the evidence seems strong
that galloping may pay. MIN_GALLOP is 7, and that's pretty
strong evidence. However, if the data is random, it simply
will trigger galloping mode purely by luck every now and
again, and it's quite likely to hit one of the losing cases
next. On the other hand, in cases like ~sort, galloping
always pays, and MIN_GALLOP is larger than it "should
be" then. So the MergeState struct keeps a min_gallop
variable that merge_lo and merge_hi adjust​: the longer
we stay in galloping mode, the smaller min_gallop gets,
making it easier to transition back to galloping mode (if
we ever leave it in the current merge, and at the start
of the next merge). But whenever the gallop loop doesn't
pay, min_gallop is increased by one, making it harder to
transition back to galloping mode (and again both within a
merge and across merges). For random data, this all but
eliminates the gallop penalty​: min_gallop grows large
enough that we almost never get into galloping mode.
And for cases like ~sort, min_gallop can fall to as low
as 1. This seems to work well, but in all it's a minor
improvement over using a fixed MIN_GALLOP value.

When using element w of one run as a "wedge" into the other run,
we cannot just use cmp(w,b) <= 0 as a means of determining whether
w is less than element b from the other. That's ok if
w comes from the earlier run, and b from the later. But if w
comes from the later run, then, to preserve stability over
equal elements, we must treat equal elements as though b,
the element from the earlier run, is strictly less than w.
In Peter's sort, we do this with the simple expedient of
setting integer "sense" to 0 if w comes from the earlier run,
and to -1 if it comes from the later run, then doing all
comparisons as

  cmp(w,b) <= sense

This technique is one of a very few contributions I made as Peter
and I worked on the source. I believe that timsort could be greatly
simplified if it employed this technique, rather than having separate
(but closely related) routines for galloping left and galloping right
or merging left and merging right. Maybe somebody should suggest
it to Tim. He could call it timCmp and claim he earned it <wink>.
OK, no more snarky comments.

My other contribution, done without Peter watching over my shoulder
to ensure I wasn't doing something stupid, was to merge the initial
set of runs using a divide-and-conquer approach. The code is
extensively commented in pp_sort.c, I won't repeat it here.
It accomplishes two goals. It balances runs early, while they
are still short. And it revisits newly created runs as soon as
balance allows, so it should be more cache-friendly than the
original "grand sweep" approach.

I don't pretend to know how much balanced run lengths contribute to
performance. A huge run is, in an important sense, a celebration
of the reduced number of comparisons it took to create it, N-1
comparisons versus NlgN comparisons. If you blow a few comparisons
(or, much less importantly, moves, which are done in line), you
are still probably well ahead. Maybe some computer science major
can compare Peter's sort to Tim's sort. Let the best sort win!
My money is on Peter's sort.

Here's a more accurate citation for Peter's sort, prior to our working
together. Thanks, Tim! Much more useful than mine.

  "Optimistic Sorting and Information Theoretic Complexity"
  Peter McIlroy
  SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp
  467-474, Austin, Texas, 25-27 January 1993.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 9, 2013

From @jplinderman

On Fri, 06 Sep 2013 18​:54​:12, Steffen Mueller <smueller@​cpan.org> wrote​:

+1 to see it go. I'm trying not to grandly suggest
that somebody (not me, likely not Nicholas) should try
implementing timsort for us now instead of the current
merge-alike. :)

My recent attempt to copy and paste a commentary on timsort appears to have
ended up where the sun don't shine. I'll attach it, knowing that
attachments don't always show up well, either.

Executive summary​: I see no reason to believe that our sort, based on code
written (and analyzed) by Peter Mcilroy, is not better than timsort. The
(long) attachment explains how I came to that conclusion. If someone
thinks otherwise, the world at large would profit from a careful
comparison. It won't be me, either, doing it (beyond the attached
analysis).

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 9, 2013

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @nwc10

On Fri, Sep 06, 2013 at 02​:59​:44PM -0400, John P. Linderman wrote​:

My email address in pp_sort.c is shown as jpl@​research.att.com. That will
stop working in a year or so. Better would be jpl.jpl@​gmail.com. Peter is

On Mon, Sep 09, 2013 at 10​:55​:33AM -0400, John P. Linderman wrote​:

Here's a more accurate citation for Peter's sort, prior to our working
together. Thanks, Tim! Much more useful than mine.

"Optimistic Sorting and Information Theoretic Complexity"
Peter McIlroy
SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp
467-474, Austin, Texas, 25-27 January 1993.

So this is an appropriate change to make to pp_sort.c?

Inline Patch
diff --git a/pp_sort.c b/pp_sort.c
index b65e9eb..ab4a6fd 100644
--- a/pp_sort.c
+++ b/pp_sort.c
@@ -53,9 +53,15 @@
  * The original code was written in conjunction with BSD Computer Software
  * Research Group at University of California, Berkeley.
  *
- * See also: "Optimistic Merge Sort" (SODA '92)
+ * See also: "Optimistic Sorting and Information Theoretic Complexity"
+ *           Peter McIlroy
+ *           SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms),
+ *           pp 467-474, Austin, Texas, 25-27 January 1993.
  *
- * The integration to Perl is by John P. Linderman <jpl@research.att.com>.
+ * The integration to Perl is by John P. Linderman <jpl.jpl@gmail.com>.
+ *
+ * John described some more details of the implementation here:
+ * http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-09/msg00345.html
  *
  * The code can be distributed under the same terms as Perl itself.
  *


Nicholas Clark
@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @nwc10

On Fri, Sep 06, 2013 at 06​:54​:12PM +0200, Steffen Mueller wrote​:

On 09/06/2013 02​:42 PM, Nicholas Clark (via RT) wrote​:

1) nothing will break if we removed _qsort.

... on CPAN.

Good point. Nothing on CPAN will break, and probably nothing else.

but it's not exactly high maintenence code.

It's certainly not, nut pp_sort.c as a whole is moderately complex.
Reducing the amount of code there can only be good.

That seems to be the principal benefit.
And the cost is either zero, or pretty close.
(This is one of those points where Google Code Search would have been useful)

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @nwc10

On Fri, Sep 06, 2013 at 02​:59​:44PM -0400, John P. Linderman wrote​:

Part of the reason, I think, was that v5.6 and earlier's sort was a
quicksort, and hence not a stable sort.

Stability was definitely a feature, but my best argument to Jarkko was that
far too many common sequences (organ pipe, sawtooth, etc.) went quadratic,
and Peter Mcilroy's merge sort did a particularly nice job on sequences
with pre-existing order (linear time on sorted input).

The only sequences (that I could think of, anyway) where it was clear that
qsort would outperform merge sort were those containing many repetitions of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k
passes would be required, where merge sort might need log N.

The bit that keeps bugging me is "how many people know that their data is
usually going to be in a particular form that qsort favours"?

In that, the amount of use cases where it's correct to specify qsort instead
of mergesort are small, the amount of effort needed to determine this is
high, and the gain itself is small?

In which case, seems that choosing a specific sort algorithm is not the
first place to look at optimising.

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @jplinderman

If you don't mind, I'd prefer to leave off that final reference to my
timsort analysis. You can't possibly get "more details of the
implementation" than pp_sort.c itself, and, particularly if I do something
about making instability an option, you might get directed to something out
of date. Which might also be true if Tim changes timsort. demerphq is
nudging me to create a wikipedia page about perl's sort. *If* I do that,
it would make a better reference, since it could be kept up to date with
current implementations of both.

On Thu, Sep 12, 2013 at 8​:41 AM, Nicholas Clark <nick@​ccl4.org> wrote​:

On Fri, Sep 06, 2013 at 02​:59​:44PM -0400, John P. Linderman wrote​:

My email address in pp_sort.c is shown as jpl@​research.att.com. That
will
stop working in a year or so. Better would be jpl.jpl@​gmail.com.
Peter is

On Mon, Sep 09, 2013 at 10​:55​:33AM -0400, John P. Linderman wrote​:

Here's a more accurate citation for Peter's sort, prior to our working
together. Thanks, Tim! Much more useful than mine.

"Optimistic Sorting and Information Theoretic Complexity"
Peter McIlroy
SODA (Fourth Annual ACM-SIAM Symposium on Discrete Algorithms), pp
467-474, Austin, Texas, 25-27 January 1993.

So this is an appropriate change to make to pp_sort.c?

diff --git a/pp_sort.c b/pp_sort.c
index b65e9eb..ab4a6fd 100644
--- a/pp_sort.c
+++ b/pp_sort.c
@​@​ -53,9 +53,15 @​@​
* The original code was written in conjunction with BSD Computer Software
* Research Group at University of California, Berkeley.
*
- * See also​: "Optimistic Merge Sort" (SODA '92)
+ * See also​: "Optimistic Sorting and Information Theoretic Complexity"
+ * Peter McIlroy
+ * SODA (Fourth Annual ACM-SIAM Symposium on Discrete
Algorithms),
+ * pp 467-474, Austin, Texas, 25-27 January 1993.
*
- * The integration to Perl is by John P. Linderman <jpl@​research.att.com

.
+ * The integration to Perl is by John P. Linderman <jpl.jpl@​gmail.com>.
+ *
+ * John described some more details of the implementation here​:
+ *
http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-09/msg00345.html
*
* The code can be distributed under the same terms as Perl itself.
*

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @nwc10

On Thu, Sep 12, 2013 at 09​:58​:28AM -0400, John P. Linderman wrote​:

If you don't mind, I'd prefer to leave off that final reference to my
timsort analysis. You can't possibly get "more details of the
implementation" than pp_sort.c itself, and, particularly if I do something
about making instability an option, you might get directed to something out
of date. Which might also be true if Tim changes timsort. demerphq is
nudging me to create a wikipedia page about perl's sort. *If* I do that,
it would make a better reference, since it could be kept up to date with
current implementations of both.

I just culled both lines. If a wikipedia page appears, we can update
pp_sort.c to point to it.

[And then the wikipedia talk folks can debate whether it's notable, and
if they so choose create a 404 error. Until we update the link to point to
their history :-)]

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @cpansprout

On Thu Sep 12 05​:54​:17 2013, nicholas wrote​:

On Fri, Sep 06, 2013 at 02​:59​:44PM -0400, John P. Linderman wrote​:

Part of the reason, I think, was that v5.6 and earlier's sort was
a
quicksort, and hence not a stable sort.

Stability was definitely a feature, but my best argument to Jarkko
was that
far too many common sequences (organ pipe, sawtooth, etc.) went
quadratic,
and Peter Mcilroy's merge sort did a particularly nice job on
sequences
with pre-existing order (linear time on sorted input).

The only sequences (that I could think of, anyway) where it was
clear that
qsort would outperform merge sort were those containing many
repetitions of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that
only k
passes would be required, where merge sort might need log N.

The bit that keeps bugging me is "how many people know that their data
is
usually going to be in a particular form that qsort favours"?

In that, the amount of use cases where it's correct to specify qsort
instead
of mergesort are small, the amount of effort needed to determine this
is
high, and the gain itself is small?

In which case, seems that choosing a specific sort algorithm is not
the
first place to look at optimising.

But would it be possible for perl to detect datasets that work better
with qsort? Or (most likely) would the time the detection takes
outweigh the benefits?

--

Father Chrysostomos

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Sep 12, 2013

From @jplinderman

I'm actually in complete agreement. I just wanted to acknowledge that
perl's mergesort was not always better than perl's quicksort on a fairly
extensive set of inputs. But if you know you are dealing with an input of
the class I mentioned, you'd be better off using a hash than with either
sort. In fact, I am so much in agreement about choosing algorithms being
less meaningful than choosing that I gave the algorithm-choosing subpragmas
funny names, starting with an underscore, and warned that they might not
persist beyond 5.8. They had better longevity than I would have
predicted. Obviously, if qsort is deprecated, so, too, must be the funny
subpragmas.

A line in Peter's SODA paper has always intrigued me

Versus the fastest readily available qsort, [a citation here to Jon
Bentley's and Doug Mcilroy's "Engineering a Sort Function" paper, which had
been submitted, but not yet accepted, for publication] the hybrid [Peter's
mergesort] is 25% slower on random integers; for random strings, time
differences are negligible.

Qsort is constrained to elements of fixed size, and the elements themselves
are moved about. As partitions get smaller and smaller, the elements are
brought closer and closer together, "finding" faster and faster cache
memory, if it exists. Mergesort sorts pointers to elements; the elements
themselves are never moved. Although the pointers will be packed together,
enjoying some of the benefits of whatever cache memory may exist, the
elements remain scattered, probably leading to a cache miss when
referenced. And the qsort comparison routine for comparing integers is
very short and efficient, so saving compares may not be so important. I
figured maybe what Peter reported was a consequence of cache memory and
simple comparisons. But if that were true for integers, why wasn't it also
true for strings? String comparison is more complicated than integer
comparison. Saving comparisons becomes more important. Or maybe because
integers are short, and easy to move about, and strings are (probably, it's
not stated) longer, the extra time spent moving data counterbalances the
advantages of cache memory. In any event, the observed behavior had
plausible explanations.

But now I wonder if there may not have been another contributing cause.
Qsort, unconstrained by stability, is free to ignore the position of equal
integers. Stable mergesort is not. As I mentioned elsewhere, this can
handicap mergesort when the input has many repeated items. If the "random
integers" were selected from a pool significantly smaller than the total
number of elements (we don't know), qsort would have another advantage.
But if (lots of "ifs" piling up here) that factors into the performance
difference Peter mentioned, and the performance difference alluded to in
pp_sort.c

The original merge sort, in use since 5.7, was as fast as, or faster than,
qsort on many platforms, but slower than qsort, conspicuously so, on others.

then the ability to ignore stability will take away that advantage of
quicksort, another reason why quicksort might be deprecated without much
pain. Since I got lots of +1s for taking a look at ignoring stability
(even though most of them came from one person :-), I'll do that. It looks
like the changes to the merge phase will be quite simple, but changes to
the setup of initial runs will be anything but.

On Thu, Sep 12, 2013 at 8​:53 AM, Nicholas Clark <nick@​ccl4.org> wrote​:

On Fri, Sep 06, 2013 at 02​:59​:44PM -0400, John P. Linderman wrote​:

Part of the reason, I think, was that v5.6 and earlier's sort was a
quicksort, and hence not a stable sort.

Stability was definitely a feature, but my best argument to Jarkko was
that
far too many common sequences (organ pipe, sawtooth, etc.) went
quadratic,
and Peter Mcilroy's merge sort did a particularly nice job on sequences
with pre-existing order (linear time on sorted input).

The only sequences (that I could think of, anyway) where it was clear
that
qsort would outperform merge sort were those containing many repetitions
of
a small number (k) of keys. Qsort's "fat pivot" guaranteed that only k
passes would be required, where merge sort might need log N.

The bit that keeps bugging me is "how many people know that their data is
usually going to be in a particular form that qsort favours"?

In that, the amount of use cases where it's correct to specify qsort
instead
of mergesort are small, the amount of effort needed to determine this is
high, and the gain itself is small?

In which case, seems that choosing a specific sort algorithm is not the
first place to look at optimising.

Nicholas Clark

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 13, 2017

From zefram@fysh.org

Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

-zefram

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 13, 2017

From @jplinderman

Zefram wrote​:

Date​: Mon, 13 Nov 2017 02​:44​:28 +0000
Subject​: Re​: [perl #119635] deprecate and remove qsort?
Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

I've been among those proposing that qsort may be taking more code-space
than it's worth (and it was I who put the underscores in the subpragmata).
I'll just point out that there are classes of inputs for which qsort may
well outperform mergesort. If there are a small number of different sort
keys, like two-letter state abbreviations in the US, in a large collection
of records, qsort will need something like log(#-of-different-keys) rather
than log(#-of-records) passes. And it sorts in-place, reducing storage
requirements. Both of these advantages disappear if you demand both qsort
and stability. So that combination almost surely should die. I was planning
to replace the existing qsort code with the Bentley/McIlroy algorithm.
<https://algs4.cs.princeton.edu/references/papers/bentley-mcilroy.pdf> But
I'm equally happy to scrap qsort altogether.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 15, 2017

From @jplinderman

A couple days ago, I sent a reply to Zefram's proposal to remove qsort. I
haven't seen my reply in the porter's digest. Maybe I just missed it, so I
won't repeat the entire post. The important part was that there are classes
of inputs for which qsort will outperform mergesort. If the number of
distinct keys, k, is much smaller than the number of records, r, qsort will
only need log(k) passes to mergesort's log(r). For example, there are only
50-some states and territories in the US, so sorting by state/territory
will benefit from qsort, 6 or 7 passes regardless of the number of records.
Perhaps what is needed is a better subpragmata, maybe something like
"few_keys" rather than "_qsort"?

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 16, 2017

From @xsawyerx

On Sun, 12 Nov 2017 18​:44​:44 -0800, zefram@​fysh.org wrote​:

Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

Agreed.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 17, 2017

From zefram@fysh.org

Done in commit e2091bb.

-zefram

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Nov 19, 2017

@xsawyerx - Status changed from 'open' to 'pending release'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 6, 2017

From @avar

On Thu, Nov 16 2017, Sawyer X. via jotted​:

On Sun, 12 Nov 2017 18​:44​:44 -0800, zefram@​fysh.org wrote​:

Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

Agreed.

All in all I don't mind that this got removed like this, but I thought
I'd point out that in the general case I think this sort of a removal is
a bit cavalier given our current deprecation policy.

I.e.​:

1. Obscure feature is discovered
2. There's grep of the CPAN discovering no uses for it
3. It's removed in one cycle, and existing uses for it turn into an
  error if people upgrade Perl.

There's a big DarkPAN out there. E.g. Sawyer and I both work for a
well-known perl company with a big codebase whose main money-making app
won't compile with 5.28 because of this, and that's some new code (C<use
sort '_quicksort'>) added in April of this year.

(Now, in that particular case I have no idea why _quicksort was used,
and it can probably just be removed, I've contacted the author).

I wonder if we should at least do this for now​:

  diff --git a/lib/sort.pm b/lib/sort.pm
  index 659f3e4f4d..c48e41cf90 100644
  --- a/lib/sort.pm
  +++ b/lib/sort.pm
  @​@​ -24,6 +24,8 @​@​ sub import {
  $^H{sort} &= ~$sort​::unstable_bit;
  } elsif ($_ eq 'defaults') {
  $^H{sort} = 0;
  + } elsif (/^_q(?​:uick)?sort$/ or $_ eq '_mergesort') {
  + warn "sort​: the $_ subpragma is a no-op since 5.28!";
  } else {
  require Carp;
  Carp​::croak("sort​: unknown subpragma '$_'");
  @​@​ -43,6 +45,8 @​@​ sub unimport {
  if ($_ eq 'stable') {
  $^H{sort} &= ~$sort​::stable_bit;
  $^H{sort} |= $sort​::unstable_bit;
  + } elsif (/^_q(?​:uick)?sort$/ or $_ eq '_mergesort') {
  + warn "sort​: the $_ subpragma is a no-op since 5.28!";
  } else {
  require Carp;
  Carp​::croak("sort​: unknown subpragma '$_'");

Or maybe it really is better if this dies at compile-time.

FWIW I initially ran into this because I was hacking some pod docs that
had to do with references to very old version (5.[68]), hadn't rebased
in a while, and had this commit ready to push before I ran into a
conflict with Zefram​:

  commit 1bb8507c8f
  Author​: Ævar Arnfjörð Bjarmason <avar@​cpan.org>
  Date​: Wed Dec 6 13​:26​:32 2017 +0000

  perlfunc​: shorten overly verbose & out of date mention of sort.pm

  This paragraph saying a 5.8 feature "may persist" in future versions
  is obviously wildly out of date as we have it now almost 16 years
  later. All of this is covered in the sort.pm documentation, it's
  enough to just briefly mention that perl uses a stable mergesort, and
  point to the other docs in case someone wants to use quicksort
  instead.
  ---
  pod/perlfunc.pod | 18 +++++-------------
  1 file changed, 5 insertions(+), 13 deletions(-)

  diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod
  index b7e001fcc8..1d117994c1 100644
  --- a/pod/perlfunc.pod
  +++ b/pod/perlfunc.pod
  @​@​ -7307,19 +7307,11 @​@​ L<C<grep>|/grep BLOCK LIST>)
  actually modifies the element in the original list. This is usually
  something to be avoided when writing clear code.

  -Perl 5.6 and earlier used a quicksort algorithm to implement sort.
  -That algorithm was not stable and I<could> go quadratic. (A I<stable> sort
  -preserves the input order of elements that compare equal. Although
  -quicksort's run time is O(NlogN) when averaged over all arrays of
  -length N, the time can be O(N**2), I<quadratic> behavior, for some
  -inputs.) In 5.7, the quicksort implementation was replaced with
  -a stable mergesort algorithm whose worst-case behavior is O(NlogN).
  -But benchmarks indicated that for some inputs, on some platforms,
  -the original quicksort was faster. 5.8 has a L<sort> pragma for
  -limited control of the sort. Its rather blunt control of the
  -underlying algorithm may not persist into future Perls, but the
  -ability to characterize the input or output in implementation
  -independent ways quite probably will.
  +Perl uses a I<stable> mergesort algorithm whose worst-case behavior is
  +O(NlogN). Benchmarks indicate that for some inputs, on some platforms,
  +quicksort can be faster, although it can have much worse performance
  +for pathological inputs. See the L<sort> module for how the sort
  +algorithm can be tweaked with L<a pragma|perlpragma>.

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 6, 2017

From zefram@fysh.org

Avar Arnfjorth Bjarmason wrote​:

I wonder if we should at least do this for now​:
...
+ } elsif (/^_q(?​:uick)?sort$/ or $_ eq '_mergesort') {
+ warn "sort​: the $_ subpragma is a no-op since 5.28!";

There's no real need for that​: the error that it'll produce leads one
straight to the problem, and the change required to get it running again
is very straightforward. But if we were to make these subpragmata legal
again, it shouldn't just be in the open-ended way that you suggest.
We don't want to have that baggage around forever. We should deprecate
them, with a defined deletion date, and on the deletion date they should
become illegal (as they are now in blead). The immediate implication
is that, rather than an unconditional warning of the type you show,
they should elicit a categorised deprecation warning that advertises
the deletion date.

-zefram

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 6, 2017

From @jplinderman

I'll admit that I was a bit disappointed that my response to Zefram's
proposal to pull the plug on qsort somehow never appeared in the digest,
and that subsequent comments concerning qsort appeared in the digest, but
the full text of those comments didn't appear in my version of the digest,
so I couldn't reply to them either. It doesn't matter much now, but I
wonder if other porters see similar problems.

I spoke with Jon Bentley last week. He confirmed my observation that if N
records have only K distinct keys, qsort can be made to behave like NlogK
rather than NlogN. If you are sorting on something like month (K==12) or
states in the US (K==50), that can be a significant improvement when N is
large. After a bit of thought, I realized that qsort can be stabilized by
allocating another copy of the array pointers *without* treating all keys
as being distinct (as the current stabilized implementation, of which I am
guilty, does, losing the logK vs logN advantage).

It would be a shame to deny users the performance gains that qsort makes
possible in certain circumstances. But the subpragmata for achieving those
gains (guilty again) were butt-ugly. Users shouldn't be mentioning
algorithms, the details of which are subject to change anyway. On the other
hand, it is entirely reasonable that users be able to express behavior,
like stability, which may or may not be important to them. We already have
subpragmata for stability. Users should be able to describe properties of
the data of which they may be aware (and which the sort code cannot
determine), like estimates of the number of distinct keys. Something
logarithmic like 10KEYS, 100KEYS, 1000KEYS etc could express estimates. If
we want to do the best we can for the users, the discussion should be about
good, subpragmatic, ways to express qualities of the data, and let the sort
code figure out the best available algorithms.

On Wed, Dec 6, 2017 at 9​:23 AM, Ævar Arnfjörð Bjarmason <avarab@​gmail.com>
wrote​:

On Thu, Nov 16 2017, Sawyer X. via jotted​:

On Sun, 12 Nov 2017 18​:44​:44 -0800, zefram@​fysh.org wrote​:

Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

Agreed.

All in all I don't mind that this got removed like this, but I thought
I'd point out that in the general case I think this sort of a removal is
a bit cavalier given our current deprecation policy.

I.e.​:

1. Obscure feature is discovered
2. There's grep of the CPAN discovering no uses for it
3. It's removed in one cycle, and existing uses for it turn into an
error if people upgrade Perl.

There's a big DarkPAN out there. E.g. Sawyer and I both work for a
well-known perl company with a big codebase whose main money-making app
won't compile with 5.28 because of this, and that's some new code (C<use
sort '_quicksort'>) added in April of this year.

(Now, in that particular case I have no idea why _quicksort was used,
and it can probably just be removed, I've contacted the author).

I wonder if we should at least do this for now​:

diff \-\-git a/lib/sort\.pm b/lib/sort\.pm
index 659f3e4f4d\.\.c48e41cf90 100644
\-\-\- a/lib/sort\.pm
\+\+\+ b/lib/sort\.pm
@&#8203;@&#8203; \-24\,6 \+24\,8 @&#8203;@&#8203; sub import \{
            $^H\{sort\} &= ~$sort&#8203;::unstable\_bit;
        \} elsif \($\_ eq 'defaults'\) \{
            $^H\{sort\} =   0;
\+       \} elsif \(/^\_q\(?&#8203;:uick\)?sort$/ or $\_ eq '\_mergesort'\) \{
\+           warn "sort&#8203;: the $\_ subpragma is a no\-op since 5\.28\!";
        \} else \{
            require Carp;
            Carp&#8203;::croak\("sort&#8203;: unknown subpragma '$\_'"\);
@&#8203;@&#8203; \-43\,6 \+45\,8 @&#8203;@&#8203; sub unimport \{
        if \($\_ eq 'stable'\) \{
            $^H\{sort\} &= ~$sort&#8203;::stable\_bit;
            $^H\{sort\} |=  $sort&#8203;::unstable\_bit;
\+       \} elsif \(/^\_q\(?&#8203;:uick\)?sort$/ or $\_ eq '\_mergesort'\) \{
\+           warn "sort&#8203;: the $\_ subpragma is a no\-op since 5\.28\!";
        \} else \{
            require Carp;
            Carp&#8203;::croak\("sort&#8203;: unknown subpragma '$\_'"\);

Or maybe it really is better if this dies at compile-time.

FWIW I initially ran into this because I was hacking some pod docs that
had to do with references to very old version (5.[68]), hadn't rebased
in a while, and had this commit ready to push before I ran into a
conflict with Zefram​:

commit 1bb8507c8f
Author&#8203;: Ævar Arnfjörð Bjarmason \<avar@&#8203;cpan\.org>
Date&#8203;:   Wed Dec 6 13&#8203;:26&#8203;:32 2017 \+0000

    perlfunc&#8203;: shorten overly verbose & out of date mention of sort\.pm

    This paragraph saying a 5\.8 feature "may persist" in future

versions
is obviously wildly out of date as we have it now almost 16 years
later. All of this is covered in the sort.pm documentation, it's
enough to just briefly mention that perl uses a stable mergesort,
and
point to the other docs in case someone wants to use quicksort
instead.
---
pod/perlfunc.pod | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)

diff \-\-git a/pod/perlfunc\.pod b/pod/perlfunc\.pod
index b7e001fcc8\.\.1d117994c1 100644
\-\-\- a/pod/perlfunc\.pod
\+\+\+ b/pod/perlfunc\.pod
@&#8203;@&#8203; \-7307\,19 \+7307\,11 @&#8203;@&#8203; L\<C\<grep>|/grep BLOCK LIST>\)
 actually modifies the element in the original list\.  This is usually
 something to be avoided when writing clear code\.

\-Perl 5\.6 and earlier used a quicksort algorithm to implement sort\.
\-That algorithm was not stable and I\<could> go quadratic\.  \(A

I<stable> sort
-preserves the input order of elements that compare equal. Although
-quicksort's run time is O(NlogN) when averaged over all arrays of
-length N, the time can be O(N**2), I<quadratic> behavior, for some
-inputs.) In 5.7, the quicksort implementation was replaced with
-a stable mergesort algorithm whose worst-case behavior is O(NlogN).
-But benchmarks indicated that for some inputs, on some platforms,
-the original quicksort was faster. 5.8 has a L<sort> pragma for
-limited control of the sort. Its rather blunt control of the
-underlying algorithm may not persist into future Perls, but the
-ability to characterize the input or output in implementation
-independent ways quite probably will.
+Perl uses a I<stable> mergesort algorithm whose worst-case behavior is
+O(NlogN). Benchmarks indicate that for some inputs, on some platforms,
+quicksort can be faster, although it can have much worse performance
+for pathological inputs. See the L<sort> module for how the sort
+algorithm can be tweaked with L<a pragma|perlpragma>.

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 6, 2017

From @jplinderman

ARRGH! My response again disappeared from the digest I just received.
Here's how my copy ends, omitting my reply, whether of not I click on the
View entire message link. Massively annoying!

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

...

[Message clipped] View entire message
<https://mail.google.com/mail/u/0/?ui=2&ik=96a8b2a39a&view=lg&msg=1602c7aad545e316>

On Wed, Dec 6, 2017 at 10​:37 AM, John P. Linderman <jpl.jpl@​gmail.com>
wrote​:

I'll admit that I was a bit disappointed that my response to Zefram's
proposal to pull the plug on qsort somehow never appeared in the digest,
and that subsequent comments concerning qsort appeared in the digest, but
the full text of those comments didn't appear in my version of the digest,
so I couldn't reply to them either. It doesn't matter much now, but I
wonder if other porters see similar problems.

I spoke with Jon Bentley last week. He confirmed my observation that if N
records have only K distinct keys, qsort can be made to behave like NlogK
rather than NlogN. If you are sorting on something like month (K==12) or
states in the US (K==50), that can be a significant improvement when N is
large. After a bit of thought, I realized that qsort can be stabilized by
allocating another copy of the array pointers *without* treating all keys
as being distinct (as the current stabilized implementation, of which I am
guilty, does, losing the logK vs logN advantage).

It would be a shame to deny users the performance gains that qsort makes
possible in certain circumstances. But the subpragmata for achieving those
gains (guilty again) were butt-ugly. Users shouldn't be mentioning
algorithms, the details of which are subject to change anyway. On the other
hand, it is entirely reasonable that users be able to express behavior,
like stability, which may or may not be important to them. We already have
subpragmata for stability. Users should be able to describe properties of
the data of which they may be aware (and which the sort code cannot
determine), like estimates of the number of distinct keys. Something
logarithmic like 10KEYS, 100KEYS, 1000KEYS etc could express estimates. If
we want to do the best we can for the users, the discussion should be about
good, subpragmatic, ways to express qualities of the data, and let the sort
code figure out the best available algorithms.

On Wed, Dec 6, 2017 at 9​:23 AM, Ævar Arnfjörð Bjarmason <avarab@​gmail.com>
wrote​:

On Thu, Nov 16 2017, Sawyer X. via jotted​:

On Sun, 12 Nov 2017 18​:44​:44 -0800, zefram@​fysh.org wrote​:

Discussion on this petered out, having mostly got sidetracked into
timsort. A couple of people were in favour of removing qsort and the
algorithm-specific pragmata, and no one opposed. Adding my opinion,
I too think we should remove these things. Pragmata to specify whether
stability is required make sense; choosing specific algorithms does
not.
I also think, given the underscore spelling and the documented status
of thesse subpragmata, that we do not need a deprecation cycle to
remove them.

If no one objects, I intend to go ahead with removal.

Agreed.

All in all I don't mind that this got removed like this, but I thought
I'd point out that in the general case I think this sort of a removal is
a bit cavalier given our current deprecation policy.

I.e.​:

1. Obscure feature is discovered
2. There's grep of the CPAN discovering no uses for it
3. It's removed in one cycle, and existing uses for it turn into an
error if people upgrade Perl.

There's a big DarkPAN out there. E.g. Sawyer and I both work for a
well-known perl company with a big codebase whose main money-making app
won't compile with 5.28 because of this, and that's some new code (C<use
sort '_quicksort'>) added in April of this year.

(Now, in that particular case I have no idea why _quicksort was used,
and it can probably just be removed, I've contacted the author).

I wonder if we should at least do this for now​:

diff \-\-git a/lib/sort\.pm b/lib/sort\.pm
index 659f3e4f4d\.\.c48e41cf90 100644
\-\-\- a/lib/sort\.pm
\+\+\+ b/lib/sort\.pm
@&#8203;@&#8203; \-24\,6 \+24\,8 @&#8203;@&#8203; sub import \{
            $^H\{sort\} &= ~$sort&#8203;::unstable\_bit;
        \} elsif \($\_ eq 'defaults'\) \{
            $^H\{sort\} =   0;
\+       \} elsif \(/^\_q\(?&#8203;:uick\)?sort$/ or $\_ eq '\_mergesort'\) \{
\+           warn "sort&#8203;: the $\_ subpragma is a no\-op since 5\.28\!";
        \} else \{
            require Carp;
            Carp&#8203;::croak\("sort&#8203;: unknown subpragma '$\_'"\);
@&#8203;@&#8203; \-43\,6 \+45\,8 @&#8203;@&#8203; sub unimport \{
        if \($\_ eq 'stable'\) \{
            $^H\{sort\} &= ~$sort&#8203;::stable\_bit;
            $^H\{sort\} |=  $sort&#8203;::unstable\_bit;
\+       \} elsif \(/^\_q\(?&#8203;:uick\)?sort$/ or $\_ eq '\_mergesort'\) \{
\+           warn "sort&#8203;: the $\_ subpragma is a no\-op since 5\.28\!";
        \} else \{
            require Carp;
            Carp&#8203;::croak\("sort&#8203;: unknown subpragma '$\_'"\);

Or maybe it really is better if this dies at compile-time.

FWIW I initially ran into this because I was hacking some pod docs that
had to do with references to very old version (5.[68]), hadn't rebased
in a while, and had this commit ready to push before I ran into a
conflict with Zefram​:

commit 1bb8507c8f
Author&#8203;: Ævar Arnfjörð Bjarmason \<avar@&#8203;cpan\.org>
Date&#8203;:   Wed Dec 6 13&#8203;:26&#8203;:32 2017 \+0000

    perlfunc&#8203;: shorten overly verbose & out of date mention of sort\.pm

    This paragraph saying a 5\.8 feature "may persist" in future

versions
is obviously wildly out of date as we have it now almost 16 years
later. All of this is covered in the sort.pm documentation, it's
enough to just briefly mention that perl uses a stable mergesort,
and
point to the other docs in case someone wants to use quicksort
instead.
---
pod/perlfunc.pod | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)

diff \-\-git a/pod/perlfunc\.pod b/pod/perlfunc\.pod
index b7e001fcc8\.\.1d117994c1 100644
\-\-\- a/pod/perlfunc\.pod
\+\+\+ b/pod/perlfunc\.pod
@&#8203;@&#8203; \-7307\,19 \+7307\,11 @&#8203;@&#8203; L\<C\<grep>|/grep BLOCK LIST>\)
 actually modifies the element in the original list\.  This is usually
 something to be avoided when writing clear code\.

\-Perl 5\.6 and earlier used a quicksort algorithm to implement sort\.
\-That algorithm was not stable and I\<could> go quadratic\.  \(A

I<stable> sort
-preserves the input order of elements that compare equal. Although
-quicksort's run time is O(NlogN) when averaged over all arrays of
-length N, the time can be O(N**2), I<quadratic> behavior, for some
-inputs.) In 5.7, the quicksort implementation was replaced with
-a stable mergesort algorithm whose worst-case behavior is O(NlogN).
-But benchmarks indicated that for some inputs, on some platforms,
-the original quicksort was faster. 5.8 has a L<sort> pragma for
-limited control of the sort. Its rather blunt control of the
-underlying algorithm may not persist into future Perls, but the
-ability to characterize the input or output in implementation
-independent ways quite probably will.
+Perl uses a I<stable> mergesort algorithm whose worst-case behavior
is
+O(NlogN). Benchmarks indicate that for some inputs, on some
platforms,
+quicksort can be faster, although it can have much worse performance
+for pathological inputs. See the L<sort> module for how the sort
+algorithm can be tweaked with L<a pragma|perlpragma>.

I.e. my reading of these docs was that the 5.8 reference was simply some
now-obsolete way of saying "we have this new feature now", not that the
docs had serious about deprecating this, after RT #119635 to remove it
didn't even get filed until 5.18 had been released!

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 18, 2017

From @xsawyerx

On 12/06/2017 04​:51 PM, Zefram wrote​:

Avar Arnfjorth Bjarmason wrote​:

I wonder if we should at least do this for now​:
...
+ } elsif (/^_q(?​:uick)?sort$/ or $_ eq '_mergesort') {
+ warn "sort​: the $_ subpragma is a no-op since 5.28!";
There's no real need for that​: the error that it'll produce leads one
straight to the problem, and the change required to get it running again
is very straightforward. But if we were to make these subpragmata legal
again, it shouldn't just be in the open-ended way that you suggest.
We don't want to have that baggage around forever. We should deprecate
them, with a defined deletion date, and on the deletion date they should
become illegal (as they are now in blead). The immediate implication
is that, rather than an unconditional warning of the type you show,
they should elicit a categorised deprecation warning that advertises
the deletion date.

Is this a price too high to pay for deprecating this over time?

In other words, why avoid deprecating this cleanly as we usually do?

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 18, 2017

From zefram@fysh.org

Sawyer X wrote​:

In other words, why avoid deprecating this cleanly as we usually do?

Because it's never been a fully supported feature. It has never incurred
the compatibility obligations that stable features do. With that
constraint not applying, we get a maintainability win from being rid of
the code quickly.

-zefram

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 18, 2017

From @xsawyerx

On 12/18/2017 01​:18 PM, Zefram wrote​:

Sawyer X wrote​:

In other words, why avoid deprecating this cleanly as we usually do?
Because it's never been a fully supported feature. It has never incurred
the compatibility obligations that stable features do.

Good reason.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 19, 2017

From @jplinderman

​>​
On 12/18/2017 01​:18 PM, Zefram wrote​:
​>​

Sawyer X wrote​:
​>​

In other words, why avoid deprecating this cleanly as we usually do?
​>​
Because it's never been a fully supported feature. It has never incurred
the compatibility obligations that stable features do.

Long term, the _qsort subpragma surely deserves to die. It asks the users
to understand the innards of pp_sort.c, which they almost certainly won't,
and which are, in any event, subject to change. But I have long been
puzzled by the observation in perldoc -f sort that

... benchmarks indicated that for some inputs, on some platforms, the
original quicksort was faster.

​I'm now convinced that this had much less to do with platform than with
the inputs, and that

...the ability to characterize the input or output in implementation
independent ways


​is a worthy goal for the sort pragma. Many of us would consider a 10%
speedup in something as common as sorting ​as worth some maintainability
loss, and I believe there will be inputs for which quicksort will be
multiple times faster than mergesort. I'm trying to instrument a version of
pp_sort.c where quicksort is still present to replace "I believe" with "I
can show". In a similar vein, "I believe" that the Bentley/McIlroy quicksort

https://algs4.cs.princeton.edu/references/papers/bentley-mcilroy.pdf

with its sophisticated selection of pivot elements, will do an even better
job on such inputs. My current goal is to instrument a version of pp_sort.c
with *both* quicksort implementations, and with mergesort implementations
with and without stability enforced, and then beat the daylights out all
options to prove correctness and get an objective measure of performance.
I'll then strip out the losing implementations, and try to integrate it
with the evolving pp_sort.c. If it's decided to discard whichever quicksort
algorithm wins out, that should be easy. If it looks like it's worth
keeping, we can discuss how "characterize the input" in ways that allow
pp_sort.c to select the appropriate algorithm.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 23, 2018

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release yesterday of Perl 5.28.0, this and 185 other issues have been
resolved.

Perl 5.28.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.28.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jun 23, 2018

@khwilliamson - Status changed from 'pending release' to 'resolved'

@p5pRT p5pRT closed this Jun 23, 2018
@p5pRT p5pRT added the Severity Low label Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.