Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache problems #2806

Open
kodebach opened this issue Jun 23, 2019 · 19 comments

Comments

Projects
None yet
3 participants
@kodebach
Copy link
Contributor

commented Jun 23, 2019

I still highly suspect that these disabled parts of kdbGet will cause problems with procgetstroage plugins.

if (debugGlobalPositions)
{
keySetName (parentKey, keyName (initialParent));
elektraGlobalGet (handle, ks, parentKey, PREGETSTORAGE, INIT);
elektraGlobalGet (handle, ks, parentKey, PREGETSTORAGE, MAXONCE);
elektraGlobalGet (handle, ks, parentKey, PREGETSTORAGE, DEINIT);
keySetName (parentKey, keyName (initialParent));
elektraGlobalGet (handle, ks, parentKey, PROCGETSTORAGE, INIT);
elektraGlobalGet (handle, ks, parentKey, PROCGETSTORAGE, MAXONCE);
elektraGlobalGet (handle, ks, parentKey, PROCGETSTORAGE, DEINIT);
}

However, I cannot verify that, because I simply cannot produce a cache hit. This is because a cache miss is caused, if the resolver returns NO_UPDATE for any of the backends.

case ELEKTRA_PLUGIN_STATUS_NO_UPDATE:
// Nothing to do here
break;

A cache hit only happens, if CACHE_HIT was returned for all backends.

if (cacheHits == split->size)
{
ELEKTRA_LOG_DEBUG ("all backends report cache is up-to-date");
return -2;
}

Not sure, whether this is intended or not. I also couldn't find any test cases for the cache, through which I could find out how to get a cache hit.


My suspicion of the bug in kdbGet should be verifiable with the examples/gopts example. If there is some way that its kdbGet call executes the "cache hit" or "no update" branches, the gopts plugin won't be called and command-line options will not be reported correctly.

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 23, 2019

Not sure, whether this is intended or not.

It is intended, yes. Without even testing it, I also think that you are right with your suspicion.

The current implementation (the diabled parts) yield about 80x improvement vs. about 10x improvement when the positions are run every time. The slow part was mostly spec afair. This is why we chose to disable these positions, as nobody was using them and the spec plugin was slow there.

If you feel like something essential was disabled, feel free to enable it again. If the plugins at that position are fast, it won‘t even be noticed.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

However, I cannot verify that, because I simply cannot produce a cache hit.

It should be quite easy to reproduce a cache hit: simply mount only supported plugins in one mountpoint and then do kdb get on this mountpoint.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2019

simply mount only supported plugins in one mountpoint and then do kdb get on this mountpoint.

You also need do a kdb set for the mountpoint, otherwise you get NO_UPDATE. And in my case for the cascading mountpoint in examples/gopts, I had to kdb set once for each namespace.


As for the NO_UPDATE case in kdbGet. It is might be correct after all, at least it works for gopts. As far as I can tell, the NO_UPDATE case is used in two cases:

  1. None of the requested mountpoints have any existing data (i.e. file/block/commit etc doesn't exist). This means the storage plugin call can be replaced by ksNew(0, KS_END).
  2. We already called kdbGet once with this KDB handle and since the last call, nothing changed.

For the second case not calling plugins is definitely correct. This is because the idea behind procgetstorage plugins is that they produce data based on the results from getstorage plugins together with static process specific properties. (Static meaning the properties do not change as long as the process exists.) Therefore the result of procgetstorage plugins must be the same, if the results from getstorage plugins are unchanged, which is the case for 2.

Case 1 on the other hand requires a more strict interpretation of what procgetstorage plugins are allowed to do. If we define that calling a procgetstorage plugin with an empty KeySet must yield an empty KeySet (i.e. a plugin that e.g. puts the PID into a fixed Key is not allowed), then not calling plugins should be fine here as well.


Anyway, I am marking this issue as urgent, because it is even worse than I thought. The rare NO_UPDATE case is probably not wrong, but the CACHE_HIT case is definitely broken.

  1. In case of a CACHE_HIT, we do not execute procgetstorage plugins. This means command-line options from the gopts plugin are completely ignored. (Note: since procgetstorage adds keys, all following global positions have to be executed as well)
  2. We actually cache the proc namespace. This means in case of a cache hit, not only are current command-line options ignored, but in fact the applications runs with the cached ones.

@kodebach kodebach added the urgent label Jun 24, 2019

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

We actually cache the proc namespace. This means in case of a cache hit, not only are current command-line options ignored, but in fact the applications runs with the cached ones.

Are you sure about this? Where does procgetstorage put its keys into? If they do not belong to a backend, they are also not cached. Otherwise any non-empty keyset put into a kdbGet call would also cause this problem (which is not the case).

Other than that, CACHE_HIT should execute exactly the same global positions as NO_UPDATE.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2019

Are you sure about this?

Yes, just tried it.

Where does procgetstorage put its keys into?

Into the proc namespace, where exactly depends on the plugin.

gopts calls elektraGetOpts. That in turn takes the parentKey, turns it into a spec namespace key and uses the part of the KeySet below that. For each spec key that contains any of the opt* metakeys, it then produces a proc key with the same name, if the command-line option is present.

If they do not belong to a backend, they are also not cached

I used spec-mount. I have no idea what it actually does, but kdb mount reports a mountpoint with cascading parent after using spec-mount. That cascading parent of course matches the proc keys created by gopts, maybe that is what causes the problem.

Anyway the nature of proc keys should make it safe to simply ksCut out all proc keys before caching.

Other than that, CACHE_HIT should execute exactly the same global positions as NO_UPDATE.

Right now both execute no plugins, except if (debugGlobalPositions) then NO_UPDATE executes some global plugins.

To make it correct, CACHE_HIT has to execute procgetstorage and following positions (unless procgetstorage returns no update), if there is a procgetstorage plugin. By default there is always a procgetstorage plugin since the list plugin has to be mounted in all positions.

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Yes, just tried it.

That is really bad, but also makes no sense to me. CACHE_HIT can only occur, where a resolver reports that the file timestamps have not changed. What you are describing would mean that those proc keys are dropped into a normal backend with a real resolver. That would cause problems even without the cache.

To make it correct, [...]

If you know of a specification of the global plugin positions, please refer me to it. This is exactly the point where I stopped working on the implementation of the global plugins. I asked about the semantics of some positions, but nobody really knows what each position should do. Now we have even more positions, but the semantics are still undocumented (to my knowledge).

What seems to be another problem to me here are contradicting design goals. On the one hand we want to cache the data from kdbGet, on the other hand we have plugins injecting keys dynamically during a get operation. During the design of the cache we discussed that a good mmap cache implementation would make a good argument against allowing such semantics (injecting dynamic data like timestamps etc.).

That being said, as I already mentioned, we can enable any needed positions.

CACHE_HIT has to execute procgetstorage and following positions

Are these the same as in NO_UPDATE or do you mean literally all positions here?

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2019

If you know of a specification of the global plugin positions, please refer me to it

I don't know of any specification either. I just know about procgetstorage because I created it for gopts.

What seems to be another problem to me here are contradicting design goals. [...]

That it is true, but right now I think just running procgetstorage should be good enough. It is not good solution, but until a better solution is found I think it is a good compromise.

Are these the same as in NO_UPDATE or do you mean literally all positions here?

The keys from procgetstorage don't belong to backend, so just running the global plugins like in NO_UPDATE right now, should be fine.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

Thank you for this beautiful discussion.

That it is true, but right now I think just running procgetstorage should be good enough. It is not good solution, but until a better solution is found I think it is a good compromise.

Yes, I am fine with that.

I don't know of any specification either

The only specification we have is doc/decisions/global_plugins.md which is unfortunately not finished and not implemented. @mpranj will also not implement it now as he also wants to finish his thesis. Maybe @mpranj is right and we should add positions when they are needed with the semantics as needed. This might lead to a mess if we are not very attentive to new positions but it would avoid many useless positions.

The keys from procgetstorage don't belong to backend, so just running the global plugins like in NO_UPDATE right now, should be fine.

Actually, we want the semantics that it will only run once after kdbOpen (also in the case of cache hits) if the parentKey contains proc. In later executions it will not be needed as cmd-line args and env cannot change after an executable has been started. (We ignore setenv as this would be only misuse of Elektra.)

We definitely need a special case for procstorage by design.

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Thank you for elaborating! I think your last proposal sounds good for now too (but in general, this is something we need to think about).

The keys from procgetstorage don't belong to backend, so just running the global plugins like in NO_UPDATE right now, should be fine.

Again, if they really do not belong to a backend, then they are never cached. I tested this sometime by hand but I'll write a test right now to confirm it. Otherwise it's a bug in my implementation.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

Again, if they really do not belong to a backend, then they are never cached. I tested this sometime by hand but I'll write a test right now to confirm it. Otherwise it's a bug in my implementation.

Actually, it does not make sense to cache proc keys at all.

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Actually, it does not make sense to cache proc keys at all.

I know, but @kodebach insisted that some proc keys were cached in his tests, which would be really really bad.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

I mean that proc keys should never be cached regardless of whether they belong to a mountpoint (at the moment it is not possible to mount at proc but this might be changed).

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2019

In later executions it will not be needed as cmd-line args and env cannot change after an executable has been started.

The spec could have changed and therefore the result might change. Maybe the best solution is to specify: If the cache is used you may only call kdbGet once for an instance of KDB.

Again, if they really do not belong to a backend, then they are never cached.

Like I said I think the problem is the cascading mountpoint from spec-mount. It causes a lot of weirdness. If the check for does key X belong to backend Y is the same as checking that key X is below the parent of backend Y with keyIsBelow then the proc keys will belong to cascading mountpoints.

I know, but @kodebach insisted that some proc keys were cached in his tests, which would be really really bad.

If you want to try it yourself:

Import this spec as spec/sw/org/erm/#0/current/

[]
 mountpoint = erm.conf

[emptydirs]
 opt/arg = none
 opt/long = dir
 opt = d
 description = remove empty directories

[files/#]
 description = the files that shall be deleted
 args = remaining
 env = FILES

[recursive]
 opt/#1 = R
 opt/#0/arg = none
 opt/#0 = r
 opt/#1/arg = none
 opt = #1
 description = remove directories and their contents recursively
 opt/#0/long = recursive

[interactive]
 opt/arg/name = WHEN
 opt/#1 = I
 opt/#0/arg = optional
 opt/#0 = i
 opt/#1/arg = none
 opt = #1
 description = prompt according to WHEN: never, once (-I), or always (-i), without WHEN, prompt always
 opt/#0/long = interactive
 opt/#1/flagvalue = once
 opt/#0/flagvalue = always

[nopreserve]
 opt/arg = none
 opt/long = no-preserve-root
 description = do not treat '/' specially

[singlefs]
 opt/arg = none
 opt/long = one-file-system
 description = when removing a hierarchy recursively, skip any directory that is on a file system different from that of the corresponding line argument

[showversion]
 opt/arg = none
 opt/long = version
 description = output version information and exit

[preserve]
 opt/arg/name = all
 opt/arg = optional
 opt/long = preserve-root
 description = do not remove '/' (default), with 'all', reject any command line argument on a separate device from its parent
 opt/flagvalue = root

[verbose]
 opt/arg = none
 opt/long = verbose
 opt = v
 description = explain what is being done
 env = VERBOSE

[force]
 opt/arg = none
 opt/long = force
 opt = f
 description = ignore nonexistent files and arguments, never prompt

EDIT: forgot the most important step:

Spec-mount the specification:

kdb spec-mount /sw/org/erm/#0/current

END OF EDIT

Set values for all the namespaces:

cd $ELEKTRA_SRC_ROOT/examples
kdb set -N dir /sw/org/erm/#0/current/verbose 0
kdb set -N user /sw/org/erm/#0/current/verbose 0
kdb set -N system /sw/org/erm/#0/current/verbose 0

(Note: The cd is important for the dir namespace)

Modify the examples/gopts.c:

Now you can complie and debug gopts. You should see that the first call to gopts is a cache miss and the following calls are all cache hits, because the KDB didn't change in between. Therefore if the first call was gopts -f abc and the second one is gopts -v, both will behave the same (like the first one).

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

The spec could have changed and therefore the result might change.

With the specload plugin the spec changes would hopefully only be "sane" and compatible to the previous ones. If you drop command-line options or change types you would need a new major version of the spec.

Maybe the best solution is to specify: If the cache is used you may only call kdbGet once for an instance of KDB.

An application cannot (and should not) know if a cache is used.

kdbGet() executed repeatedly without config changes should be a no-op.

If you want to try it yourself:

Thanks for the test! I would love to see this as shell-recorder test.

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 24, 2019

With the specload plugin the spec changes would hopefully only be "sane" and compatible to the previous ones. If you drop command-line options or change types you would need a new major version of the spec.

specload is just one storage plugin. It is completely optional and we shouldn't in any way rely on it in kdbGet. This mistake was already made with spec and list.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

Of course specload is validation and:

  1. different implementations may come
  2. there are always ways to bypass validation.

But from an application that was started with app --foo there is no sane way to later say --foo is not a valid command-line argument. Thus we need to assume (or validate) that specifications only change in ways that is compatible with running applications. (Otherwise a new major version of the config spec is needed.)

So I think for now (for LCDproc) it is safe to assume that command-line options are not changed for running applications, so changes in spec/ are ignored. And with this assumption, changes in proc/ are not possible due to its semantics.

@markus2330

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

I split the rare case described here to #2810

So this issue is about cache hits+procgetstorage

@mpranj

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

@kodebach: would it be enough to simply cut "proc/" here and let the PROC positions run in case of cache hits, or is that not enough for the expected behaviour?

@kodebach

This comment has been minimized.

Copy link
Contributor Author

commented Jun 25, 2019

would it be enough to simply cut "proc/"

That should be done before caching in any case. proc keys are process specific so caching across processes makes no sense.

let the PROC positions run in case of cache hits

Should be enough for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.