CounterServiceDemo: concurrent mass increment blocks clients #39

belaban · 2015-10-06T05:44:36Z

When running 2 (or more) instances of CounterServiceDemo and incrementing the same counter from 2 instances at the same time (e.g. press [9] and pick 1000 increments), then both instances block.
A preliminary investigation showed that the issue is with REDIRECT.

The text was updated successfully, but these errors were encountered:

belaban · 2015-10-06T06:38:20Z

Looks like REDIRECT doesn't block, but is just very slow when the leader itself performs a mass increment of the counter. If A increments by 100 and B 100'000, or vice versa, concurrently, then - as soon as the first instance is done - the other instance increments very quickly.

belaban · 2015-10-15T11:56:24Z

Looks like the culprit is RAFT.resend_interval which is 1000 by default. Lowering it (e.g. to 100) speeds things up dramatically. Investigating why resend_interval is causing this.

belaban · 2015-10-15T12:43:21Z

The reason why resend_interval is important is the following scenario:

A sends messages with indices [34..38], B's prev_index is 33
B receives 34, sets prev_index to 34
B receives 35, sets prev_index to 35
B receives 37, drops it as its prev_index of 36 doesn't match the current prev_index of 35
** The reason for the out of order message is that the threads in A grab indices and then the order of message delivery depends on which thread reached NAKACK2 first
B receives 36, sets prev_index to 36
B receives 38, but drops it as its prev_index of 37 doesn't match the current prev_index of 36

So messages with indices 37 and 38 will not get appended to the log of B until the resend_task kicks in (by default every 1000 ms).

Suggested solutions:

When sending a message in RAFT, extend the scope of the lock where the index is incremented until after the message has been sent. Downside: bad concurrency
When receiving a message that's out of order, return false to the sender, but still append the message to the log.

belaban · 2015-10-15T16:13:18Z

OK, I chose solution 1

…he index, so we make sure that all updates are received in the same order (established by NAKACK2) [#39]

… the index, so we make sure that all updates are received in the same order (established by NAKACK2) [#39] - Upgrade Mapdb to version 1.0.8 - Changed JGroups version to 3.6.+ (snapshot)

belaban added the bug label Oct 6, 2015

belaban self-assigned this Oct 6, 2015

belaban added this to the 0.3 milestone Oct 6, 2015

belaban closed this as completed Oct 15, 2015

belaban added a commit that referenced this issue Jan 13, 2016

RAFT: sending a message is done inside of the lock which increments t…

6c9d3d4

…he index, so we make sure that all updates are received in the same order (established by NAKACK2) [#39]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CounterServiceDemo: concurrent mass increment blocks clients #39

CounterServiceDemo: concurrent mass increment blocks clients #39

belaban commented Oct 6, 2015

belaban commented Oct 6, 2015

belaban commented Oct 15, 2015

belaban commented Oct 15, 2015

belaban commented Oct 15, 2015

CounterServiceDemo: concurrent mass increment blocks clients #39

CounterServiceDemo: concurrent mass increment blocks clients #39

Comments

belaban commented Oct 6, 2015

belaban commented Oct 6, 2015

belaban commented Oct 15, 2015

belaban commented Oct 15, 2015

belaban commented Oct 15, 2015