Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CounterServiceDemo: concurrent mass increment blocks clients #39

Closed
belaban opened this issue Oct 6, 2015 · 4 comments
Closed

CounterServiceDemo: concurrent mass increment blocks clients #39

belaban opened this issue Oct 6, 2015 · 4 comments
Assignees
Labels
Milestone

Comments

@belaban
Copy link
Member

belaban commented Oct 6, 2015

When running 2 (or more) instances of CounterServiceDemo and incrementing the same counter from 2 instances at the same time (e.g. press [9] and pick 1000 increments), then both instances block.
A preliminary investigation showed that the issue is with REDIRECT.

@belaban belaban added the bug label Oct 6, 2015
@belaban belaban self-assigned this Oct 6, 2015
@belaban belaban added this to the 0.3 milestone Oct 6, 2015
@belaban
Copy link
Member Author

belaban commented Oct 6, 2015

Looks like REDIRECT doesn't block, but is just very slow when the leader itself performs a mass increment of the counter. If A increments by 100 and B 100'000, or vice versa, concurrently, then - as soon as the first instance is done - the other instance increments very quickly.

@belaban
Copy link
Member Author

belaban commented Oct 15, 2015

Looks like the culprit is RAFT.resend_interval which is 1000 by default. Lowering it (e.g. to 100) speeds things up dramatically. Investigating why resend_interval is causing this.

@belaban
Copy link
Member Author

belaban commented Oct 15, 2015

The reason why resend_interval is important is the following scenario:

  • A sends messages with indices [34..38], B's prev_index is 33
  • B receives 34, sets prev_index to 34
  • B receives 35, sets prev_index to 35
  • B receives 37, drops it as its prev_index of 36 doesn't match the current prev_index of 35
    ** The reason for the out of order message is that the threads in A grab indices and then the order of message delivery depends on which thread reached NAKACK2 first
  • B receives 36, sets prev_index to 36
  • B receives 38, but drops it as its prev_index of 37 doesn't match the current prev_index of 36

So messages with indices 37 and 38 will not get appended to the log of B until the resend_task kicks in (by default every 1000 ms).

Suggested solutions:

  1. When sending a message in RAFT, extend the scope of the lock where the index is incremented until after the message has been sent. Downside: bad concurrency
  2. When receiving a message that's out of order, return false to the sender, but still append the message to the log.

@belaban
Copy link
Member Author

belaban commented Oct 15, 2015

OK, I chose solution 1

@belaban belaban closed this as completed Oct 15, 2015
belaban added a commit that referenced this issue Jan 13, 2016
…he index, so we make sure that all updates are received in the same order (established by NAKACK2) [#39]
belaban added a commit that referenced this issue Feb 1, 2016
… the index, so we make sure that all updates are received in the same order (established by NAKACK2) [#39]

- Upgrade Mapdb to version 1.0.8
- Changed JGroups version to 3.6.+ (snapshot)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant