PreferSlave seems not honored after cache failover #1077

VicaryM · 2019-03-02T09:17:30Z

We set up a redis shard with 1 Primary and 2 ReadReplicas.
And we specified CommandFlags.PreferSlave to prefer read from ReadReplica.
So initially we had 001 (Primary - no reads), 002 (ReadReplica - getting half of the reads), 003 (ReadReplica - getting half of the reads). All as expected.
Then we did a failover of 001. After that:
001 (no reads), 002 (getting all the reads), 003 (no reads).
Base on aws log, 002 becomes primary.
Why 002 is getting all the reads after the failover? We expect the two ReadReplicas (001 and 003) should share the reads as we specified PreferSlave.

The text was updated successfully, but these errors were encountered:

VicaryM · 2019-04-01T08:57:55Z

We consulted AWS, and here is their reply:
"Regarding this behavior of the cluster, I suspect it is because the client connection is still not updated with the new configuration and is still bound to the configuration before the failover. To test this behavior I recommend you to create a new test application and connect to it in a similar way. If the requests are correctly redirected according to the commandflags, we can conclude that it is still picking up the stale configuration."

We checked that a new connection is able to send reads correctly to the Read-Replicas.
Seems the client code does not automatically update cluster configuration when it has changed due to a failover?
Base on the observed behavior in the main post, looks like the clients updated cluster configuration while the failover took place (due to connection failure?), resulting in a configuration that only has a single master and no slaves?

NickCraver · 2020-03-14T18:04:25Z

I think this is likely the same issue as #1120 under the covers. It's not that the slave isn't used, it's that connection is generally bad state after the disconnect. A fix for this is slated for 2.1.x currently in progress in #1374. Closing this just to consolidate issues down and keep the discussion in one spot.

NickCraver closed this as completed Mar 14, 2020

NickCraver linked a pull request Mar 14, 2020 that will close this issue

PhysicalBridge always returns NoConnection after exception during message write / connection flush #1374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreferSlave seems not honored after cache failover #1077

PreferSlave seems not honored after cache failover #1077

VicaryM commented Mar 2, 2019

VicaryM commented Apr 1, 2019

NickCraver commented Mar 14, 2020

PreferSlave seems not honored after cache failover #1077

PreferSlave seems not honored after cache failover #1077

Comments

VicaryM commented Mar 2, 2019

VicaryM commented Apr 1, 2019

NickCraver commented Mar 14, 2020