deadlock in replica_write_ruv #344

389-ds-bot · 2020-09-12T14:08:22Z

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/344

Created at 2012-04-20 21:58:24 by rmeggins (@richm)
Closed as Duplicate
Assigned to nhosoi (@nhosoi)

replica_write_ruv() does the modify with the OP_FLAG_REPL_FIXUP
replica_create_ruv_tombstone() does too, and so does replica_replace_ruv_tombstone() - the OP_FLAG_REPL_FIXUP flag causes the database to be not locked:

	if(SERIALLOCK(li) && !operation_is_flag_set(operation,OP_FLAG_REPL_FIXUP)) {
		dblayer_lock_backend(be);
		dblock_acquired= 1;
	}

If the event queue fires replica_write_ruv() at the right time, it will conflict with the same RUV update from replica_replace_ruv_tombstone() or (probably not) replica_create_ruv_tombstone().

I think the solution is to always do the database SERIALLOCK. Since inst->inst_db_mutex is now a PRMonitor instead of a plain mutex, it is already re-entrant to the same thread, which was the original intent of the OP_FLAG_REPL_FIXUP flag - to allow the urp database plugins to modify entries. Alternately, change the urp be pre/post op plugins to be betxn pre/post op plugins.

The text was updated successfully, but these errors were encountered:

389-ds-bot · 2020-09-12T14:08:24Z

Comment from rmeggins (@richm) at 2012-08-14 19:56:22

set default ticket origin to Community

389-ds-bot · 2020-09-12T14:08:25Z

Comment from nkinder (@nkinder) at 2012-08-28 04:14:22

Added initial screened field value.

389-ds-bot · 2020-09-12T14:08:26Z

Comment from nhosoi (@nhosoi) at 2012-10-04 05:17:52

Replying to [ticket:344 richm]:

replica_write_ruv() does the modify with the OP_FLAG_REPL_FIXUP
replica_create_ruv_tombstone() does too, and so does replica_replace_ruv_tombstone() - the OP_FLAG_REPL_FIXUP flag causes the database to be not locked:
	if(SERIALLOCK(li) && !operation_is_flag_set(operation,OP_FLAG_REPL_FIXUP)) {
		dblayer_lock_backend(be);
		dblock_acquired= 1;
	}
If the event queue fires replica_write_ruv() at the right time, it will conflict with the same RUV update from replica_replace_ruv_tombstone() or (probably not) replica_create_ruv_tombstone().

I think the solution is to always do the database SERIALLOCK. Since inst->inst_db_mutex is now a PRMonitor instead of a plain mutex, it is already re-entrant to the same thread, which was the original intent of the OP_FLAG_REPL_FIXUP flag - to allow the urp database plugins to modify entries. Alternately, change the urp be pre/post op plugins to be betxn pre/post op plugins.

In the process of making the plugins betxn aware, the location of SERIALLOCK is being moved into dblayer_txn_begin and the lock is held regardless of the OP_FLAG_REPL_FIXUP flag as Rich suggested. So, this issue would be solved together with the ticket 351 fix.

To verify the bug, what would be the best scenario? I ran quite a heavy stress test add, modify, and delete cases involved against the server which contains the 351 patch for a week. The replication topology is made from the 4 masters + 2 hubs + 4 read-only replicas. Could it be good enough to say this bug is solved?

389-ds-bot · 2020-09-12T14:08:26Z

Comment from rmeggins (@richm) at 2012-10-04 05:36:43

Replying to [comment:5 nhosoi]:

In the process of making the plugins betxn aware, the location of SERIALLOCK is being moved into dblayer_txn_begin and the lock is held regardless of the OP_FLAG_REPL_FIXUP flag as Rich suggested. So, this issue would be solved together with the ticket 351 fix.

To verify the bug, what would be the best scenario? I ran quite a heavy stress test add, modify, and delete cases involved against the server which contains the 351 patch for a week. The replication topology is made from the 4 masters + 2 hubs + 4 read-only replicas. Could it be good enough to say this bug is solved?

Yes.

389-ds-bot · 2020-09-12T14:08:28Z

Comment from nhosoi (@nhosoi) at 2012-10-06 07:27:52

Mark as duplicate of 351.

389-ds-bot · 2020-09-12T14:08:28Z

Comment from nhosoi (@nhosoi) at 2017-02-11 22:53:14

Metadata Update from @nhosoi:

Issue assigned to nhosoi
Issue set to the milestone: 1.3.0.a1

389-ds-bot added the closed: duplicate Migration flag - Issue label Sep 12, 2020

389-ds-bot added this to the 1.3.0.a1 milestone Sep 12, 2020

389-ds-bot closed this as completed Sep 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlock in replica_write_ruv #344

deadlock in replica_write_ruv #344

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

deadlock in replica_write_ruv #344

deadlock in replica_write_ruv #344

Comments

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020

389-ds-bot commented Sep 12, 2020