Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting in reverse, #9 review #18

Open
ws-garcia opened this issue Sep 3, 2021 · 4 comments
Open

Sorting in reverse, #9 review #18

ws-garcia opened this issue Sep 3, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@ws-garcia
Copy link

Hi Senipah! As usual, I was digging into the sorting methods and the strengths/weaknesses of these algorithms. As you may know, VBA CSV interface uses, even these days, the fast dual pivot sorting method described by Yaroslavskiy, whose main disadvantage is its "instability". This is the reason why many developers opt for alternatives that could be labeled as less efficient, but that provide "stability" when it becomes imperative to sort successively using multiple columns/sort keys.

Among the "stable" algorithms recognized and acclaimed as the most efficient are MergeSort and TimSort. At that point I remembered that one of the modifications you made to BetterArray mentioned the adoption of the latter (simplified version) as the default sorting method of your wonderful library. So, after having implemented MergeSort, HeapSort and IntroSort (the latter with "stable" output when the user specifies all columns in the sort command), I wanted to take a look at the options offered by your implementation.

There I discovered, for example, that BetterArray does not take advantage of the most sought-after feature of TimSort: "stability". So, your library does not allow to sort the data using 3 keys with ascending order and 1 key with descending order that maintains, partially, the ordering given by the previous three keys.

Attached are the screenshots of the result after trying the Sort method and then Reverse that you recommend in this comment; as well as the result obtained with my implementation of MergeSort.

You may also want to do something about the performance of TimSort, I think it can be improved when the targets of the operation are not objects.

Regards!

Note: column A is sorted in ascending order, then columns B and E are sorted in the same order, finally, column C is sorted in descending order.

Expected result:
Screenshot (425)

Actual result:
Screenshot (424)

@ws-garcia ws-garcia added the enhancement New feature or request label Sep 3, 2021
@Senipah
Copy link
Owner

Senipah commented Sep 3, 2021

I don't understand your example here, how you think the expected result is incorrect if you are reversing the order, nor how this shows that my TimSort implementation is not stable.

"Stability" in this sense means that the sort maintains the relative order of records with equal keys.

The "Actual" results are what I would expect from Reversing an array - all of the results are in descending order, because the "stability" of the previous sorts is maintained, but the order reversed. Seems your issue is more that I don't provide a descending order sort? If so I will be happy to look at opening an enhancement for that feature, but this current issue makes it seem like TimSort is not stable, which is untrue.

You may also want to do something about the performance of TimSort, I think it can be improved when the targets of the operation are not objects.

Always open to concrete suggestions.

@Senipah Senipah closed this as completed Sep 3, 2021
@ws-garcia
Copy link
Author

So, your library does not allow to sort the data using 3 keys with ascending order and 1 key with descending order that maintains, partially, the ordering given by the previous three keys.

Recapitulating this point.

this current issue makes it seem like TimSort is not stable, which is untrue.

TimSort is stable, but if BetterArray does not provide a function for top-down sorting, users will never be able to produce output like the following (when a mixed sorting sense is required):

Screenshot (427)

Screenshot (428)

Note that columns sorted in ascending order partially maintain their order even if another key is used to sort in descending order.

@Senipah
Copy link
Owner

Senipah commented Sep 3, 2021

Correct, at the moment, that is not possible. I originally thought you were saying that TimSort is not stable, which it is.

However, the point is a fair one that using Reverse in place of descending sort is unsatisfactory in cases where a multi-level sort is desired as it will reverse the order of all previous sorts. I will leave this thread open as a future enhancement to add descending sort. Thanks for

Re object sorting - I am unsure what to do here really. I wrote all the methods to try to accommodate this as you know the point of this lib was to provide maximum utility/ease of use, even at the expense of performance, but at present I do prevent sorting by objects by throwing an error (even where object sorting is supported). So I need to make a decision here as to what to do in terms of supporting that...

@Senipah Senipah reopened this Sep 3, 2021
@ws-garcia
Copy link
Author

In some preliminary tests, I believe that your library can be 150% more efficient if you restrict the ordering to data other than objects, this because the number of decisions will only be a function of the data itself and not of its typology. Personally, given the enormous efficiency with which your Timsort implementation works, I think eliminating the bottlenecks of the sorting methods would be SUPER! I will keep abreast of the new improvements, if I can collaborate on something, I will definitely do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants