Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock in replica_write_ruv #344

Closed
389-ds-bot opened this issue Sep 12, 2020 · 6 comments
Closed

deadlock in replica_write_ruv #344

389-ds-bot opened this issue Sep 12, 2020 · 6 comments
Labels
closed: duplicate Migration flag - Issue
Milestone

Comments

@389-ds-bot
Copy link

Cloned from Pagure issue: https://pagure.io/389-ds-base/issue/344


replica_write_ruv() does the modify with the OP_FLAG_REPL_FIXUP
replica_create_ruv_tombstone() does too, and so does replica_replace_ruv_tombstone() - the OP_FLAG_REPL_FIXUP flag causes the database to be not locked:

	if(SERIALLOCK(li) && !operation_is_flag_set(operation,OP_FLAG_REPL_FIXUP)) {
		dblayer_lock_backend(be);
		dblock_acquired= 1;
	}

If the event queue fires replica_write_ruv() at the right time, it will conflict with the same RUV update from replica_replace_ruv_tombstone() or (probably not) replica_create_ruv_tombstone().

I think the solution is to always do the database SERIALLOCK. Since inst->inst_db_mutex is now a PRMonitor instead of a plain mutex, it is already re-entrant to the same thread, which was the original intent of the OP_FLAG_REPL_FIXUP flag - to allow the urp database plugins to modify entries. Alternately, change the urp be pre/post op plugins to be betxn pre/post op plugins.

@389-ds-bot 389-ds-bot added the closed: duplicate Migration flag - Issue label Sep 12, 2020
@389-ds-bot 389-ds-bot added this to the 1.3.0.a1 milestone Sep 12, 2020
@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2012-08-14 19:56:22

set default ticket origin to Community

@389-ds-bot
Copy link
Author

Comment from nkinder (@nkinder) at 2012-08-28 04:14:22

Added initial screened field value.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2012-10-04 05:17:52

Replying to [ticket:344 richm]:

replica_write_ruv() does the modify with the OP_FLAG_REPL_FIXUP
replica_create_ruv_tombstone() does too, and so does replica_replace_ruv_tombstone() - the OP_FLAG_REPL_FIXUP flag causes the database to be not locked:

	if(SERIALLOCK(li) && !operation_is_flag_set(operation,OP_FLAG_REPL_FIXUP)) {
		dblayer_lock_backend(be);
		dblock_acquired= 1;
	}

If the event queue fires replica_write_ruv() at the right time, it will conflict with the same RUV update from replica_replace_ruv_tombstone() or (probably not) replica_create_ruv_tombstone().

I think the solution is to always do the database SERIALLOCK. Since inst->inst_db_mutex is now a PRMonitor instead of a plain mutex, it is already re-entrant to the same thread, which was the original intent of the OP_FLAG_REPL_FIXUP flag - to allow the urp database plugins to modify entries. Alternately, change the urp be pre/post op plugins to be betxn pre/post op plugins.

In the process of making the plugins betxn aware, the location of SERIALLOCK is being moved into dblayer_txn_begin and the lock is held regardless of the OP_FLAG_REPL_FIXUP flag as Rich suggested. So, this issue would be solved together with the ticket 351 fix.

To verify the bug, what would be the best scenario? I ran quite a heavy stress test add, modify, and delete cases involved against the server which contains the 351 patch for a week. The replication topology is made from the 4 masters + 2 hubs + 4 read-only replicas. Could it be good enough to say this bug is solved?

@389-ds-bot
Copy link
Author

Comment from rmeggins (@richm) at 2012-10-04 05:36:43

Replying to [comment:5 nhosoi]:

In the process of making the plugins betxn aware, the location of SERIALLOCK is being moved into dblayer_txn_begin and the lock is held regardless of the OP_FLAG_REPL_FIXUP flag as Rich suggested. So, this issue would be solved together with the ticket 351 fix.

To verify the bug, what would be the best scenario? I ran quite a heavy stress test add, modify, and delete cases involved against the server which contains the 351 patch for a week. The replication topology is made from the 4 masters + 2 hubs + 4 read-only replicas. Could it be good enough to say this bug is solved?

Yes.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2012-10-06 07:27:52

Mark as duplicate of 351.

@389-ds-bot
Copy link
Author

Comment from nhosoi (@nhosoi) at 2017-02-11 22:53:14

Metadata Update from @nhosoi:

  • Issue assigned to nhosoi
  • Issue set to the milestone: 1.3.0.a1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed: duplicate Migration flag - Issue
Projects
None yet
Development

No branches or pull requests

1 participant