Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

Closed
389-ds-bot opened this issue Sep 13, 2020 · 13 comments
Closed

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

389-ds-bot opened this issue Sep 13, 2020 · 13 comments
Labels
merged Migration flag - PR pr Migration flag - PR

Comments

@389-ds-bot
Copy link

389-ds-bot commented Sep 13, 2020

Cloned from Pagure Pull-Request: https://pagure.io/389-ds-base/pull-request/49989

  • Created at 2018-10-24 16:57:50 by tbordaz (@tbordaz)
  • Merged at 2018-10-25 11:31:13

Bug Description:
During a MODRDN the DN cache is updated to replace
source DN with the target DN (modrdn_rename_entry_update_indexes)
If later a failure occurs (for example if BETXN_POSTOP fails) and
the txn is aborted, the target DN (for the specific entryID) remains
in the DN cache.

If the entry is returned in a search, to build the DN there is
a lookup of the DN cache with the entryID. It retrieves the target DN
rather than the source DN

Fix Description:
In case of failure of the operation, the entry (from the entryID)
need to be cleared from the DN cache

Resolves: #3026

Reviewed by: ?

Platforms tested: F27

Flag Day: no

Doc impact: no

@389-ds-bot 389-ds-bot added merged Migration flag - PR pr Migration flag - PR labels Sep 13, 2020
@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-10-24 23:26:30

This solves another customer's crash. ACK!

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2018-10-25 11:30:30

rebased onto ab4af68

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2018-10-25 11:31:13

Pull-Request has been merged by tbordaz

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-12-05 15:59:11

@tbordaz so your fix cleans up the DN cache, but it does not perform the same process for entry cache. In the customer case I have its the entry cache, not the dn cache, that causes the crash.

I wonder if we also have to do something like:

struct backentry *be = cache_find_id(&inst->inst_cache, ec->ep_id);
CACHE_REMOVE(&inst->inst_cache, be);
CACHE_RETURN(&inst->inst_cache, &be);

Thoughts?

@389-ds-bot
Copy link
Author

Comment from tbordaz (@tbordaz) at 2018-12-05 16:16:25

@mreynolds389, IMHO I think that upon failure, the destination entry in the entrycache is removed few lines below (comment "remove the new entry from the cache if the op failed...").

The entry should be 'cache_is_in_cache' because it is flagged ENTRY_STATE_CREATING at that time.

The difficulty here is that the number of possible paths is so large that without a testcase we can not be sure.

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:22:30

@mreynolds389, IMHO I think that upon failure, the destination entry in the entrycache is removed few lines below (comment "remove the new entry from the cache if the op failed...").

I guess I was wondering if the call below that you mentioned is NOT releasing the correct entry (cache_is_in_cache returning "no"). Perhaps it should be retrieved again (cache_find_id) before we check if its in the cache? Sorry still looking through the code so maybe this is not possible, but since it could happen with the dn cache, then why not the entry cache? :)

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-12-05 16:33:57

but for the ec it is done just after this,

if (ec && inst) ....
just the removal of the dncache was missing in modrdn.

I think we need to look into possible issues in ldbm_delete

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-12-05 16:39:46

In ldbm_back_delete we have remove_e_from_cache to trigger a removal from the ec in some conditions, what about dn cache in that case ?

Mark, would this be a candiate for your debug logging ?

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:44:58

but for the ec it is done just after this,
if (ec && inst) ....
just the removal of the dncache was missing in modrdn.
I think we need to look into possible issues in ldbm_delete

But the crash in delete seems to happen because of a blotched modrdn cache entry. When we call id2entry_add in ldbm_delete it's finding the "previous DN" entry in the cache (which was freed but not remove from the table). Well that was the original crash - not sure hows it's crashing now with Thierry's fix. The new debug patch will tell us more...

@389-ds-bot
Copy link
Author

Comment from lkrispen (@elkris) at 2018-12-05 16:48:09

but I didn't see a MODRDN for the entry in the latest data set :-(

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:55:50

In ldbm_back_delete we have remove_e_from_cache to trigger a removal from the ec in some conditions, what about dn cache in that case ?
Mark, would this be a candiate for your debug logging ?

Unfortunately it looks like that does not come into play until after the crash. It's outside of the txn at that point. But it doesn't hurt to add some logging to see what DN it is...

@389-ds-bot
Copy link
Author

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:56:58

but I didn't see a MODRDN for the entry in the latest data set :-(

Hmmm, well I guess we need to see what the new debug patch tells us...

@389-ds-bot
Copy link
Author

Patch
49989.patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged Migration flag - PR pr Migration flag - PR
Projects
None yet
Development

No branches or pull requests

1 participant