Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PreferSlave seems not honored after cache failover #1077

Closed
VicaryM opened this issue Mar 2, 2019 · 2 comments · Fixed by #1374
Closed

PreferSlave seems not honored after cache failover #1077

VicaryM opened this issue Mar 2, 2019 · 2 comments · Fixed by #1374

Comments

@VicaryM
Copy link

VicaryM commented Mar 2, 2019

We set up a redis shard with 1 Primary and 2 ReadReplicas.
And we specified CommandFlags.PreferSlave to prefer read from ReadReplica.
So initially we had 001 (Primary - no reads), 002 (ReadReplica - getting half of the reads), 003 (ReadReplica - getting half of the reads). All as expected.
Then we did a failover of 001. After that:
001 (no reads), 002 (getting all the reads), 003 (no reads).
Base on aws log, 002 becomes primary.
Why 002 is getting all the reads after the failover? We expect the two ReadReplicas (001 and 003) should share the reads as we specified PreferSlave.

@VicaryM
Copy link
Author

VicaryM commented Apr 1, 2019

We consulted AWS, and here is their reply:
"Regarding this behavior of the cluster, I suspect it is because the client connection is still not updated with the new configuration and is still bound to the configuration before the failover. To test this behavior I recommend you to create a new test application and connect to it in a similar way. If the requests are correctly redirected according to the commandflags, we can conclude that it is still picking up the stale configuration."

We checked that a new connection is able to send reads correctly to the Read-Replicas.
Seems the client code does not automatically update cluster configuration when it has changed due to a failover?
Base on the observed behavior in the main post, looks like the clients updated cluster configuration while the failover took place (due to connection failure?), resulting in a configuration that only has a single master and no slaves?

@NickCraver
Copy link
Collaborator

I think this is likely the same issue as #1120 under the covers. It's not that the slave isn't used, it's that connection is generally bad state after the disconnect. A fix for this is slated for 2.1.x currently in progress in #1374. Closing this just to consolidate issues down and keep the discussion in one spot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants