PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

389-ds-bot · 2020-09-13T04:19:59Z

Cloned from Pagure Pull-Request: https://pagure.io/389-ds-base/pull-request/49989

Created at 2018-10-24 16:57:50 by tbordaz (@tbordaz)
Merged at 2018-10-25 11:31:13

Bug Description:
During a MODRDN the DN cache is updated to replace
source DN with the target DN (modrdn_rename_entry_update_indexes)
If later a failure occurs (for example if BETXN_POSTOP fails) and
the txn is aborted, the target DN (for the specific entryID) remains
in the DN cache.

If the entry is returned in a search, to build the DN there is
a lookup of the DN cache with the entryID. It retrieves the target DN
rather than the source DN

Fix Description:
In case of failure of the operation, the entry (from the entryID)
need to be cleared from the DN cache

Resolves: #3026

Reviewed by: ?

Platforms tested: F27

Flag Day: no

Doc impact: no

The text was updated successfully, but these errors were encountered:

389-ds-bot · 2020-09-13T04:20:01Z

Comment from mreynolds (@mreynolds389) at 2018-10-24 23:26:30

This solves another customer's crash. ACK!

389-ds-bot · 2020-09-13T04:20:02Z

Comment from tbordaz (@tbordaz) at 2018-10-25 11:30:30

rebased onto ab4af68

389-ds-bot · 2020-09-13T04:20:02Z

Comment from tbordaz (@tbordaz) at 2018-10-25 11:31:13

Pull-Request has been merged by tbordaz

389-ds-bot · 2020-09-13T04:20:03Z

Comment from mreynolds (@mreynolds389) at 2018-12-05 15:59:11

@tbordaz so your fix cleans up the DN cache, but it does not perform the same process for entry cache. In the customer case I have its the entry cache, not the dn cache, that causes the crash.

I wonder if we also have to do something like:

struct backentry *be = cache_find_id(&inst->inst_cache, ec->ep_id);
CACHE_REMOVE(&inst->inst_cache, be);
CACHE_RETURN(&inst->inst_cache, &be);

Thoughts?

389-ds-bot · 2020-09-13T04:20:04Z

Comment from tbordaz (@tbordaz) at 2018-12-05 16:16:25

@mreynolds389, IMHO I think that upon failure, the destination entry in the entrycache is removed few lines below (comment "remove the new entry from the cache if the op failed...").

The entry should be 'cache_is_in_cache' because it is flagged ENTRY_STATE_CREATING at that time.

The difficulty here is that the number of possible paths is so large that without a testcase we can not be sure.

389-ds-bot · 2020-09-13T04:20:05Z

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:22:30

@mreynolds389, IMHO I think that upon failure, the destination entry in the entrycache is removed few lines below (comment "remove the new entry from the cache if the op failed...").

I guess I was wondering if the call below that you mentioned is NOT releasing the correct entry (cache_is_in_cache returning "no"). Perhaps it should be retrieved again (cache_find_id) before we check if its in the cache? Sorry still looking through the code so maybe this is not possible, but since it could happen with the dn cache, then why not the entry cache? :)

389-ds-bot · 2020-09-13T04:20:06Z

Comment from lkrispen (@elkris) at 2018-12-05 16:33:57

but for the ec it is done just after this,

if (ec && inst) ....
just the removal of the dncache was missing in modrdn.

I think we need to look into possible issues in ldbm_delete

389-ds-bot · 2020-09-13T04:20:07Z

Comment from lkrispen (@elkris) at 2018-12-05 16:39:46

In ldbm_back_delete we have remove_e_from_cache to trigger a removal from the ec in some conditions, what about dn cache in that case ?

Mark, would this be a candiate for your debug logging ?

389-ds-bot · 2020-09-13T04:20:08Z

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:44:58

but for the ec it is done just after this,
if (ec && inst) ....
just the removal of the dncache was missing in modrdn.
I think we need to look into possible issues in ldbm_delete

But the crash in delete seems to happen because of a blotched modrdn cache entry. When we call id2entry_add in ldbm_delete it's finding the "previous DN" entry in the cache (which was freed but not remove from the table). Well that was the original crash - not sure hows it's crashing now with Thierry's fix. The new debug patch will tell us more...

389-ds-bot · 2020-09-13T04:20:09Z

Comment from lkrispen (@elkris) at 2018-12-05 16:48:09

but I didn't see a MODRDN for the entry in the latest data set :-(

389-ds-bot · 2020-09-13T04:20:09Z

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:55:50

In ldbm_back_delete we have remove_e_from_cache to trigger a removal from the ec in some conditions, what about dn cache in that case ?
Mark, would this be a candiate for your debug logging ?

Unfortunately it looks like that does not come into play until after the crash. It's outside of the txn at that point. But it doesn't hurt to add some logging to see what DN it is...

389-ds-bot · 2020-09-13T04:20:10Z

Comment from mreynolds (@mreynolds389) at 2018-12-05 16:56:58

but I didn't see a MODRDN for the entry in the latest data set :-(

Hmmm, well I guess we need to see what the new debug patch tells us...

389-ds-bot · 2020-09-13T04:20:12Z

Patch
49989.patch

389-ds-bot added merged Migration flag - PR pr Migration flag - PR labels Sep 13, 2020

389-ds-bot closed this as completed Sep 13, 2020

389-ds-bot mentioned this issue Sep 13, 2020

entry cache corruption after failed MODRDN #3026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

389-ds-bot commented Sep 13, 2020 •

edited

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

PR - Ticket 49967 - entry cache corruption after failed MODRDN #3048

Comments

389-ds-bot commented Sep 13, 2020 • edited

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020

389-ds-bot commented Sep 13, 2020 •

edited