Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Multiple OrderBy #152

Closed
pdf opened this issue May 3, 2017 · 5 comments
Closed

Feat: Multiple OrderBy #152

pdf opened this issue May 3, 2017 · 5 comments

Comments

@pdf
Copy link
Contributor

pdf commented May 3, 2017

The limitation of a single OrderBy field is somewhat problematic, requiring the user to do a second sort pass when multi-field ordering is required.

I'd be happy to help produce a PR, but I'd appreciate some guidance. The red-black tree would have to go, since we can't pad arbitrary-length values to obtain correct sorting over multiple keys. Since all values end up in memory anyway, would performing an in-place sort after filtering be acceptable?

The other side is the API - as OrderBy currently takes a string, I guess the way to deal with it without breaking the API is to allow OrderBy to be called multiple times.

Thoughts?

@asdine
Copy link
Owner

asdine commented May 3, 2017

This is a great idea ! We could change this OrderBy(field string) to this OrderBy(fields ...string).
It wouldn't break existing code i guess and it works with what we want to achieve. WDYT?

Also, for the red-black tree, it's not a problem but how would you achieve the in-place sort?

@pdf
Copy link
Contributor Author

pdf commented May 3, 2017

Oh, of course the variadic form will keep working for existing users, that makes life easy.

My off-the-cuff thinking was to just use sort.Sort() to quicksort the results, comparing the field list in sequence (only considering next field when current is equal) with each comparison, should still be roughly O(n*log(n)) I think. I'll have to do some more investigation to see how feasible this approach is though. If you're amenable to something like this, I'll take a closer look at the code over the weekend at what makes sense.

@asdine
Copy link
Owner

asdine commented May 3, 2017

The problem i see using sort.Sort() is that we would move from O(log n) to O(n log n) (for sorting alone) which is theorically a pretty huge gap in performance.
Also, i think we can keep using the red black tree for multi fields sorting using a little trick.

Currently, we are generating a slice of bytes from the value of the chosen field (see here)
We could keep doing this by concatenating all the fields together after encoding and give these keys to the red black tree.

Example:
OrderBy("Name", "Age", "Country")
-> sortingKey for record A = "John11France"
-> sortingKey for record B = "John11Germany"
-> sortingKey for record C = "John10Germany"

which would be sorted like this: [C, A, B]

WDYT?

@pdf
Copy link
Contributor Author

pdf commented May 3, 2017

Unfortunately, that doesn't work as expected, consider:

// OrderBy("Name, "Age", "Country")

A = Person{Name: "John12",  Age: 9, Country: "France"}
B = Person{Name: "John" Age: 15, Country: "Germany"}
// A sorting key: `John129France`
// B sorting key: `John15Germany`

Expected result (A.Name > B.Name): [B, A]
Actual Result: [A, B]

This is why I mentioned the issue with arbitrary padding - to use a single string for comparison you'd have to pad each field to the maximum length from all values, which you don't know ahead of time, and the keys would blow out massively in memory and comparison time.

As an aside, time-complexity of the tree sort should be worst-case O(n*log(n)), no? Quicksort can certainly be worse, so I'm open to ideas on how to retain the tree, but I'm not seeing any that are sane.

@asdine
Copy link
Owner

asdine commented May 3, 2017

You are right, i spoke too fast. Furthermore, it's more interesting to compare non encoded values together than encoded ones since they are already loaded in memory, and i don't see any other solution to keep using the rb tree.

Let's try with the sort.Sort func 👍

pdf added a commit to pdf/storm that referenced this issue May 6, 2017
Results are sorted by the specified fields with descending precedence.

Fixes asdine#152
@pdf pdf mentioned this issue May 6, 2017
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants