-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
objects and memory leaks #3547
Comments
Comment from mreynolds (@mreynolds389) at 2019-07-11 19:33:06 Yeah I've had plenty of issues with this object acquiring and releasing when I worked on CleanAllRUV. It is definitely a mess, but at the same time we don't seem to have any known crashes related to this - at least I'm not aware of any of crashes. I guess the crash threat is more around removing a replica while the replica is processing updates? Adding a lock might work, but I worry about the perf impact. |
Comment from mreynolds (@mreynolds389) at 2019-07-11 19:33:07 Metadata Update from @mreynolds389:
|
Comment from lkrispen (@elkris) at 2019-07-14 20:43:45 yes, we shoul not put any effort into the object mechanism, I am thinking about an approach in the other direction, at least for the replica objects. |
Comment from firstyear (@Firstyear) at 2019-07-15 01:12:41 How hard would it be to remove the "object" mechanism from the replica struct, and then re-approach how we do refcounting? |
Comment from tbordaz (@tbordaz) at 2019-07-15 17:35:27 Just a comment I agree with @mreynolds389 and @elkris, objects code would benefit from a lock but considering the possible impact on performance and that we have not seen any crash we may postpone the addition of a lock. I think object are not designed for long life object. We continuously access (acquire/release) an object that is here for sure until the end. It is a waste of energy and like you said @elkris it is buggy anyway as acquire/release are not balance. |
Comment from mreynolds (@mreynolds389) at 2019-08-08 17:27:13 Metadata Update from @mreynolds389:
|
Comment from lkrispen (@elkris) at 2020-02-26 16:14:31 fixed: #3579 |
Comment from lkrispen (@elkris) at 2020-02-26 16:14:42 Metadata Update from @elkris:
|
Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/50490
When I investigated memory leaks I came across one type of leak with objects, which also raises the question of teh safety of the object implementation.
The leak is (part of it, there are much more allocations inside replica_new):100:
So there is a replica object created by a call to replica_new() and the added as an object to the mapping tree:
it provides a destructor: replica_destroy(), but this is only called if refcount==1 when object_release is called.
It is hard to find where the balance of object_acquire() and object_release() gets broken.
I did some investigation and tentative fixes, adding calls to object_release() in places I thought they were missing - with mixed success. In some test scenarios the leak disappeared, in others the refcount to the last call of object_release was reduced to 2 (one still missing), in others I did get a crash because the object was already freed before the last call.
So there is definitely a mess in our acquire and release balance, but investigating this I also have a doubt how safe our object implementation is.
The only protection is a refcount to prevent freeing of an object while in use, but there are some conditions where it can go wrong.
First look at object_acquire:
there is no guarantee that the object is valid when called. We have eg references in the prp replication protocol struct, which is then used again and again by the replication protocol.
And looking at object_release() we see that there could be race conditions not protected by the refcount:
if acquire and release are called in parallel for an object with refcount of 1, release could decrement it to 0 before acquire has incremented it. the single ops are atomic, but the order in which they are called is open. So onen thread can decrement it, the other thread increment it and be happy, but it can be freed before use.
The text was updated successfully, but these errors were encountered: