-
-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.algorithm.schwartzSort is slow #9891
Labels
Comments
peter.alexander.au (@Poita) commented on 2010-10-19T00:05:18ZI get quite similar timings:
sort: 3.8
alt. sort: 3.4
schwartz: 25.8
This is with -inline -O -release. |
andrei (@andralex) commented on 2010-10-19T06:37:48ZThis is a misunderstanding of Schwartz sort is supposed to do. D's equivalent of Python's key argument is not Schwartz sort, but instead:
sort!q{p.x < p.y}(data);
i.e. the keys are not copied but instead a projection is used for the comparison. That's your "alt" sort. Schwartz sort is meant for usage when the key computation (in this case a simple member access) is too expensive to be done more than once. To avoid that, schwartzSort creates an array of the computed keys and then sorts the keys and the original arrays in lockstep. It is expected that schwartzSort is considerably slower than sort for cheap keys. It is also expected that "alt" sort is the best of the breed because it has the fastest key computation.
I'm leaving this open in case you have new experiments that do reveal a problem. Otherwise, feel free to close it. |
bearophile_hugs commented on 2010-10-22T04:52:28Z(In reply to comment #2)
Thank you for your answers.
> This is a misunderstanding of Schwartz sort is supposed to do. D's equivalent
> of Python's key argument is not Schwartz sort, but instead:
>
> sort!q{p.x < p.y}(data);
>
> i.e. the keys are not copied but instead a projection is used for the
> comparison. That's your "alt" sort.
In that D benchmark there are 3 kinds of sort, two use the normal sort() the other uses schwartzSort.
The "alternative" version is still a normal sort. The only difference is that it uses a delegate, as you see from the code:
(P a, P b) { return a.y < b.y; }
instead of a "template lambda":
q{ a.y < b.y }
Regarding Python, years ago Python2 used to have just a sort like this, with the "cmp" argument:
s.sort(cmp)
From the docs:
http://docs.python.org/library/stdtypes.html#mutable-sequence-types
cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(), y.lower()). The default value is None.
Recent Python2 versions use have a more complex sort signature:
s.sort([cmp[, key[, reverse]]])
Where beside the "cmp" that's still present, there is the "key" function:
key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None.
Python3 has removed the cmp argument.
In the bug report I was referring to "key" that's a function that takes a single argument and return a single transformed item. So in Python if you use the "key" you are performing a Schwartz sort.
C source code of CPython is available online, if you use "key" CPython builds a temporary array of the transformed data, that is used to sort the true data.
> I'm leaving this open in case you have new experiments that do reveal a
> problem. Otherwise, feel free to close it.
The performance of schwartzSort is too much low, so in my opinion the bug report needs to be open still. |
bearophile_hugs commented on 2010-10-22T04:55:33ZSorry, I have forgotten a Python version of the code:
from random import random, seed
from operator import itemgetter
SortType_none = 0
SortType_sort = 1
SortType_schwartz = 2
DATA_LEN = 1000 # **********
sort_type = SortType_schwartz
def main():
seed(1)
data = [(random(), random()) for _ in xrange(DATA_LEN)]
if len(data) < 50: print data
if sort_type == SortType_schwartz:
data.sort(key=itemgetter(1))
data.sort(key=itemgetter(0))
data.sort(key=itemgetter(1))
if sort_type == SortType_sort:
data.sort(cmp=lambda a, b: cmp(a[1], b[1]))
data.sort(cmp=lambda a, b: cmp(a[0], b[0]))
data.sort(cmp=lambda a, b: cmp(a[1], b[1]))
if len(data) < 50: print data
main() |
bearophile_hugs commented on 2011-06-21T15:53:29ZSee bug 6192 for more |
dlang-bugzilla (@CyberShadow) commented on 2017-07-07T21:52:16ZStill reproducible in 2017.
FWIW, with LDC, the difference is smaller: only schwartzSort is slower than regular sort only by about one third rather than being about twice as slow. |
safety0ff.bugz commented on 2017-07-08T18:14:59Z(In reply to Vladimir Panteleev from comment #6)
> Still reproducible in 2017.
>
> FWIW, with LDC, the difference is smaller: only schwartzSort is slower than
> regular sort only by about one third rather than being about twice as slow.
Sounds like typical poor performance of using DMD with ranges (i.e. std.range.zip used by schwartzSwort.)
Possible duplicate of: #14943 & #16120
Aside: Stripping out dynamic stopping policies from std.range.zip led to a few % improvement on LDC, nothing big though. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
bearophile_hugs reported this on 2010-10-18T19:08:43Z
Transfered from https://issues.dlang.org/show_bug.cgi?id=5077
CC List
Description
The text was updated successfully, but these errors were encountered: