Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

shino · 2015-07-29T03:30:18Z

Symptom : single slow node may cause user fetch failure.

Getting CS user is in two steps, first step with PR=all,
in which single slow riak node can cause timeout error at client.
When timeout occurs in riakc_pb_socket, it disconnects TCP connection
and goes into wait-and-retry loop.
Then, CS user get 2nd phase with weak option, but it's likely that reconnect
does not happen yet, fails with disconnected error.

If "slow" node is completely frozen (no action will come out from it),
after health check timeout, strong get fails by "insufficient vnodes"
and weak get should work well. For this case, certain user can not
access Riak CS for finite time period, 60 sec by default.

Reproduction (or simulation)

Create 4-node cluster ({get_user_timeout, 3000} in advanced.config may help)

Memo dev2 pid

DEV2=`ps aux | grep riak_ee | grep dev1 | grep beam.smp | awk '{print $2;}' `

Freeze it: kill -s SIGSTOP $DEV2 (keep your fingers crossed, if unfortunate, freeze another node 🙉)
Do any access,

The text was updated successfully, but these errors were encountered:

shino · 2015-09-07T08:10:58Z

For release note, short version: Improve user object fetch logic when some nodes are slow or silently failed.
For longer version, please refer the description of this issue.

shino · 2015-09-07T08:11:18Z

Addressed by #1230

Basho-JIRA changed the title ~~Timeout in get user with strong option makes subsequent weak get fail~~ Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] Jul 29, 2015

Basho-JIRA added the JIRA: To Do label Jul 29, 2015

shino mentioned this issue Aug 31, 2015

Give grace period for riakc_pb_socket to reconnect #1230

Merged

Basho-JIRA added JIRA: In Progress JIRA: Needs Review and removed JIRA: To Do JIRA: In Progress labels Sep 7, 2015

Basho-JIRA assigned shino Sep 7, 2015

shino added this to the 2.1.0 milestone Sep 7, 2015

shino closed this as completed Sep 7, 2015

Basho-JIRA added JIRA: Closed and removed JIRA: Needs Review labels Sep 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

shino commented Jul 29, 2015

shino commented Sep 7, 2015

shino commented Sep 7, 2015

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

Timeout in get user with strong option makes subsequent weak get fail [JIRA: RCS-250] #1201

Comments

shino commented Jul 29, 2015

shino commented Sep 7, 2015

shino commented Sep 7, 2015