Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ksExtract #2992

Closed
kodebach opened this issue Sep 20, 2019 · 14 comments

Comments

@kodebach
Copy link
Contributor

commented Sep 20, 2019

The snippet

KeySet * ks1 = ksDup(ks);
KeySet * interestingPart = ksCut (ks1, key);
ksDel (ks1);
// .... use interestingPart ....

or the alternative

KeySet * interestingPart = ksCut (ks, key);
// .... use interestingPart ....
ksAppend (ks, interestingPart);

are used quite frequently. We should add a counterpart to ksCut that does not modify the original KeySet. The ideal signature would be:

// ksCopy would go better with ksCut, but it is already in use
KeySet  * ksExtract (const KeySet * ks, const Key * parentKey);

The const KeySet * would be possible with an external iterator, but it might even be possible with the old system.

Originally posted by @kodebach in #2979 (comment)

@kodebach kodebach added the proposal label Sep 20, 2019
@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2019

Thank you for creating this proposal!

Do you think ksExtract should be the function instead of ksDup? If it is only an idea for a new feature, we could also add it any later time (it would not be relevant for 1.0).

Btw. I wonder why you did not request for a KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));. When someone passes keyIsBelowOrSame you would get something like you requested above.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

Do you think ksExtract should be the function instead of ksDup?

They could be merged, but then we would should optimise for the ksDup case. I don't think that is a good idea. It just complicates the implementation and the API is clearer, if ksDup exists on its own. I wouldn't expect that ksExtract is what I need to duplicate a KeySet.

Btw. I wonder why you did not request for a KeySet * ksExtract (const KeySet * ks, const Key * parentKey, int (*toextract)(Key*, Key*));

I assume the function would take the keys below parentKey (including parentKey) and then additionally filter them with toextract (should use const Key * btw). I don't see why that would have to be separate function. You could easily iterate over the extracted keys yourself and just skip any keys you don't care about.

It also prevents an optimisation that could be applied to ksCut as well. (Note: I haven't tested this)

ksSearchInternal returns the index of the given key, or if isn't present the index of the first present key that is bigger than the given key (according to keyset order). ksCut already uses this to find the start of the cut.

The end of the cut, could be found via ksSearchInternal as well. We just need to search for the right key. An example:

  • Assume a keyset with the following keys:
    user/test/abcd
    user/test/cutpoint
    user/test/cutpoint/a
    user/test/cutpoint1
    user/test/xyz
    
  • These keys have the unescaped names (I use for the NULL character U+0000):
    user⓪test⓪abcd⓪
    user⓪test⓪cutpoint⓪
    user⓪test⓪cutpoint⓪a⓪
    user⓪test⓪cutpoint1⓪
    user⓪test⓪xyz⓪
    
  • We called ksCut with the key user/test/cutpoint.
  • It calls ksSearchInternal to find the start of the cut. It searches the unescaped name user⓪test⓪cutpoint⓪.
  • The current iteration now starts a loop and checks every following key, until it finds user/test/cutpoint1. Instead we can use ksSearchInternal again.
    • This time we search for the unescaped name user⓪test⓪cutpoint①⓪ ( represents U+0001). This is the first possible unescaped key name that is not below user/test/cutpoint.
    • While this key will almost never be present in the keyset, that doesn't matter ksSearchInternal will just return the next key that is present.
    • That means the returned key will always be the first key after our cut.
@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2019

wouldn't expect that ksExtract is what I need to duplicate a KeySet.

Exactly, and ksExtract is basically only ksDup with filter.

Thank you for looking into the details. But I think ksExtract is only an new feature and does not really help for 1.0.

The optimization for ksCut is a very good idea, can you maybe split this up in a second issue? But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 22, 2019

Exactly, and ksExtract is basically only ksDup with filter.

Not sure, what I want say here... ksExtract is technically only a limited ksDup. Sometimes you need exactly that and without ksExtract you get a lot of unnecessary overhead (either copying and deleting additional keys, or cutting and appending needlessly).

Replacing ksDup with ksExtract would be possible, like you said, but it would be a confusing naming scheme. If we only want a single function, we should change ksDup to accept a parent key (if it is NULL we use the old behaviour). With symbol versioning this shouldn't be hard to do.

But for 1.0 we also only should do optimizations which are really important (have use cases that are too slow otherwise.)

I'm not sure about "too slow otherwise", but ksCut is definitely used a lot, so any improvement could make a big difference.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2019

If we can efficiently extract/cut arrays, we actually do not need to store the length of arrays, do we? We could reconsider doc/decisions/array.md

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2019

Another idea: what about a ksLookup option to find the element hierarchically after some key.

So something like ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY) would return
user/test/xyz (in your example above)

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 23, 2019

we actually do not need to store the length of arrays, do we?

Both the spec plugin and the high-level API rely on the array metakey. In general I also think that storing the size of an array separately is a good idea. It is basically the same reasoning that also explains, why most (if not all) modern programming languages store string sizes, instead of calculating them when needed like C.

What we should do instead, to make handling of arrays easier, is introduce things like kdb array add <arrayParentKey> <index> and kdb array del <arrayParentKey> <index>.

Another idea: what about a ksLookup option to find the element hierarchically after some key.

So something like ksLookup(ks, "user/test/abcd", KDB_O_HIERARCHY) would return
user/test/xyz (in your example above)

Not sure what "hierarchically after" means. If you mean the next key after X (according to keyset order) that is not below X, then your example is wrong.

Nevertheless, I think ksLookup is already more than complex enough and we should not add anything else. Adding new options to ksLookup complicates the code. This introduces potential for errors and most likely also creates a performance overhead (not a big one, but ksLookup is one of the most called functions in Elektra, so every little bit counts).

There are also further question with this approach: How does it combine with other option? What happens with cascading keys?

IMO we should stick with KISS and introduce a new function, if we need new functionality. Also this kind of lookup would probably be more useful, if it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees. I can't actually think of another use case right now, especially not one, where I just want a Key *.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 23, 2019

Both the spec plugin and the high-level API rely on the array metakey.

Good to know! This is completely undocumented in doc/METADATA.ini where array is still an unused and only proposed feature.

What we should do instead, to make handling of arrays easier, is introduce things like kdb array add and kdb array del .

Yes, I agree. But this is not essential for 1.0.

Not sure what "hierarchically after" means.

The key after the cut point (as you described above).

most likely also creates a performance overhead

ksLookup already has a switch (or cascading if) for the options. So adding another option should not create a performance impact.

There are also further question with this approach: How does it combine with other option? What happens with cascading keys?

This we need to consider anyway.

IMO we should stick with KISS and introduce a new function

Have you a suggestion for a name?

And there is some overhead for adding functions to a library. (Did you maybe already measured it?)

it works with iterators/cursors/indices (whatever we want to call it). Then it could be used in an iteration to skip subtrees

With the internal cursor ksLookup is the way to go as it moves the internal cursor.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 23, 2019

The key after the cut point (as you described above).

Then your example should have been one of:

ksLookupByName (ks, "user/test/abcd", KDB_O_HIERARCHY); // -> user/test/cutpoint
ksLookupByName (ks, "user/test/cutpoint", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint/a", KDB_O_HIERARCHY); // -> user/test/cutpoint1
ksLookupByName (ks, "user/test/cutpoint1", KDB_O_HIERARCHY); // -> user/test/xyz

So adding another option should not create a performance impact.

Like I said, it would be a tiny performance impact, but ksLookup is called thousands of times in an application (about 20 000 times in a minimal LCDd setup) so even that could matter.

And there is some overhead for adding functions to a library.

In terms of performance?

With the internal cursor ksLookup is the way to go as it moves the internal cursor.

Don't we want to get rid of the internal cursor? (#2991 (comment))

Have you a suggestion for a name?

Depends on what the function will actually do and whether is based on Key * or cursor_t.

In general, I think ksLookup should be for looking up a key, i.e. I have a keyname (cascading or not) and want to find the actual Key * inside a keyset for this keyname. In other words the input keyname of ksLookup should always match its output keyname (apart from the namespace).

Another function should be created for different kinds of search operations. It might be useful not only to find the next key on the same level, but also the previous one or maybe the first key above (the immediate parent may not be in the keyset) or below.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2019

In terms of performance?

Yes, also performance, as the symbols need to be looked up when loading the libs. KDE hat huge problems with this but they also have an order of magnitude more libs.

Don't we want to get rid of the internal cursor? (#2991 (comment))

Realistically, we will only be able to remove it from the public interface but not completely. Also the error messages need to be updated to contain the cause (@Piankero) because at the moment the cursor marks where the error happened.

In general, I think ksLookup should be for looking up a key

Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to ksLookup at all.

And if we do this only in the public interface (and let the private interface more or less the same), we could even do it for 1.0.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2019

Realistically, we will only be able to remove it from the public interface but not completely.

Why? Where do we actually need the internal cursor?

because at the moment the cursor marks where the error happened.

I don't think this is always the case. In certain situations, kdbGet executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.

Yes, it would be a cleaner interface. But then we also should remove KDB_O_POP which is actually the only left-over. If we get rid of this, we could avoid the argument to ksLookup at all.

KDB_O_CREATE and KDB_O_DEL might also be used in some places, though the should be easy to replace.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2019

Why? Where do we actually need the internal cursor?

Maybe nowhere (except for the old system to pinpoint which key caused the error) but there are 271 occurrences of ksNext in the folder "src". And in these 271 places it is not enough to replace a function call but you actually need to rewrite loops. So there are also plenty ways of how bugs can be introduced.

I don't think this is always the case. In certain situations, kdbGet executes more plugins after an error and many plugins don't set the cursor to the key that caused the error in the first place.

Yes, exactly.

KDB_O_CREATE and KDB_O_DEL might also be used in some places, though the should be easy to replace.

In only about 10 places, so this should be manually doable in one day of work.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2019

there are 271 occurrences of ksNext in the folder "src".

I never said it would be easy

@markus2330 markus2330 added this to the 2.0.0 milestone Oct 17, 2019
@markus2330

This comment has been minimized.

Copy link
Contributor

commented Oct 17, 2019

I think all of this could be moved to 2.0.0. If there is something reasonable for 1.0.0, please tell me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.