Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve ksAppend() #2455

Open
mpranj opened this Issue Mar 2, 2019 · 6 comments

Comments

Projects
None yet
3 participants
@mpranj
Copy link
Member

mpranj commented Mar 2, 2019

Motivation

When kdbGet() has a cache hit but the supplied keyset was not empty, we need to append the cached keyset to the non-empty keyset. In cases where the appended keyset is much larger than the destination it could be beneficial to append the smaller keyset to the larger one. (less ksAppendKey operations).

Proposal

Add a ksMerge() function, which returns only one resulting keyset and ksDel()etes the other keyset. This is for the cases, where the other keyset would be thrown away anyway.

@mpranj mpranj added the proposal label Mar 2, 2019

@markus2330

This comment has been minimized.

Copy link
Contributor

markus2330 commented Mar 2, 2019

Thank you for writing the proposal!

When kdbGet() has a cache hit but the supplied keyset was not empty

The same situation occurs when the command-line arguments are appended to proc/ in the global plugin done by @kodebach, see #2317

In cases where the appended keyset is much larger than the destination

For this specific situation, I think an if that swaps the arguments of ksAppend in the user code is acceptable.

I would like to avoid to have many API calls that appear to be highly optimized but actually only contain trivial (slow) operations. I would very much see optimizations for ksAppend. In its current form, it was maybe a mistake that we provide ksAppend.

@mpranj

This comment has been minimized.

Copy link
Member Author

mpranj commented Mar 2, 2019

I agree that improving ksAppend is more important than adding another API function.

For this specific situation, I think an if that swaps the arguments of ksAppend in the user code is acceptable.

The problem is that the semantics of ksAppend do not allow for this optimization. ksAppend has to append source to destination. Swapping the arguments would not result in an equivalent output, because we must not append destination to source.

Therefore the idea only works for a new API function, that only promises to merge two sets to one, and throws away the other set.

@kodebach

This comment has been minimized.

Copy link
Contributor

kodebach commented Mar 2, 2019

I would very much see optimizations for ksAppend

Two things that I think are quite simple to implement, but might make a huge difference:

  1. ksAppendKey calls ksSearchInternal, which does a complete binary search. This can be optimized by first comparing against the last Key and immediately returning, if the new Key has to be inserted at the end. This would make ksAppendKey calls much faster, when Keys are inserted in the correct order. (Currently we do log(ksSize) Key comparisons to arrive at this result.)

  2. ksAppend(ks1, ks2) should basically operate like this:

    • Create an array of the correct size
    • Iterate over both ks1->array and ks2->array, always appending the "smaller" (according to keyCmp) Key to the new array and advancing the corresponding iterator.
    • Replace the array of ks1 with the new one.

    That way all ksAppendKey calls we make are in the correct order and ksSearchInternal will only do a single comparison. Doing it this way would also eliminate any concerns about argument order, because the array inside ks1 would be completely recreated, instead of inserting into the existing array. It would also keep the current semantics. The only guarantees that we make are: 1) ks2 is unchanged (true, if we restore the cursor correctly), 2) ks1 contains all the keys

@mpranj

This comment has been minimized.

Copy link
Member Author

mpranj commented Mar 2, 2019

@kodebach thank you for weighing in! I think your approach sounds quite nice.

I would have to add one small improvement: when merging two sorted lists it should suffice to simply always append the smaller element. In that case we need no ksSearchInternal at all. It will always be inserted at the last position.

@kodebach

This comment has been minimized.

Copy link
Contributor

kodebach commented Mar 2, 2019

My approach would still be calling ksAppendKey so ksSearchInternal is basically unavoidable, but of course you could do the appending directly in ksAppend. However, then you have to make that any other stuff (Opmphm, keyRef, etc) that gets updated in ksAppendKey is still updated correctly.

@markus2330 markus2330 changed the title ksMerge() improve ksAppend() Mar 3, 2019

@markus2330 markus2330 added this to the 0.9.0 milestone Mar 3, 2019

@markus2330

This comment has been minimized.

Copy link
Contributor

markus2330 commented Mar 3, 2019

I like both approaches. I think we should first try @kodebach's optimization (O(1) appending in ksAppendKey). If we see that this is not enough, we can also follow @mpranj approach to swap the KeySets (so that always the smaller keyset gets appended to the larger one). I am not sure if this actually needs an API change. Maybe we can simply internally first duplicate the KeySet, and then append the small one to the large one within ksAppend; and then swap the internal array ks holds with the duplicated KeySet (holding the merged keysets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.