Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Command cannot be issued to a slave" after a Redis cluster failover #282

Closed
actionthomas opened this issue Sep 21, 2015 · 13 comments
Closed

Comments

@actionthomas
Copy link

I'm currently testing failover with the latest client version (1.0.481).
When Redis performs a failover (for testing purposes, I'm issuing the "cluster failover" command manually), if I try to make a Write operation, the client throws a MasterOnly exception.
But on the Redis side, master and slaves have changed.

My setup : a 6 instances redis cluster.

127.0.0.1:6379> cluster nodes
4b1bc1008e8efae3b34a71a9370d544d3c08caea 10.9.28.73:6381 master - 0 144282823222
8 25 connected 5461-10922
60c6a00e1d72f91e3faee4206a1202fa653b78bd 10.9.28.74:6379 myself,slave 4b1bc1008e
8efae3b34a71a9370d544d3c08caea 0 0 24 connected
0b8378cca831e69118b066f78485cc5ee22a86d3 10.9.28.74:6380 slave bfe3865e346c5cee1
e6c754701569ed45423b114 0 1442828232556 4 connected
2943a1c1f78a60729ccb1b2c2f2fb96ad4600ede 10.9.28.73:6380 master - 0 144282823190
0 2 connected 10923-16383
bfe3865e346c5cee1e6c754701569ed45423b114 10.9.28.73:6379 master - 0 144282823146
4 1 connected 0-5460
1e7ce773ae14cba010b0d6e5390fd9e7305f40af 10.9.28.74:6381 slave 2943a1c1f78a60729
ccb1b2c2f2fb96ad4600ede 0 1442828232992 6 connected

The C# test client :

private static void Main(string[] args)
{
            var mpx = ConnectionMultiplexer.Connect("10.9.28.73:6379,10.9.28.73:6380,10.9.28.74:6379,10.9.28.74:6380,10.9.28.73:6381,10.9.28.74:6381", Console.Out);
            var db = mpx.GetDatabase();
            db.StringSet("test", "value!");

             while (true)
            {
                try
                {
                    db = mpx.GetDatabase();
                    db.KeyExpire("test", TimeSpan.FromMinutes(1));
                }
                catch (Exception ex)
                {
                    Console.Write(ex);
                }
            }
}

After issuing the failover command, here are the nodes :

127.0.0.1:6379> cluster nodes
4b1bc1008e8efae3b34a71a9370d544d3c08caea 10.9.28.73:6381 slave 60c6a00e1d72f91e3
faee4206a1202fa653b78bd 0 1442829978122 28 connected
60c6a00e1d72f91e3faee4206a1202fa653b78bd 10.9.28.74:6379 myself,master - 0 0 28
connected 5461-10922
0b8378cca831e69118b066f78485cc5ee22a86d3 10.9.28.74:6380 slave bfe3865e346c5cee1
e6c754701569ed45423b114 0 1442829977686 4 connected
2943a1c1f78a60729ccb1b2c2f2fb96ad4600ede 10.9.28.73:6380 master - 0 144282997845
0 2 connected 10923-16383
bfe3865e346c5cee1e6c754701569ed45423b114 10.9.28.73:6379 master - 0 144282997801
3 1 connected 0-5460
1e7ce773ae14cba010b0d6e5390fd9e7305f40af 10.9.28.74:6381 slave 2943a1c1f78a60729
ccb1b2c2f2fb96ad4600ede 0 1442829979214 6 connected

(10.9.28.74:6379 is now a master).
The client catches the exception

StackExchange.Redis.RedisConnectionException: ProtocolFailure on EXPIRE ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a slave: EXPIRE test
   à StackExchange.Redis.PhysicalBridge.WriteMessageToServer(PhysicalConnection connection, Message message) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\PhysicalBridge.cs:ligne 766
   --- Fin de la trace de la pile d'exception interne ---
   à StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:ligne 1935
   à StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:ligne 80
   à StackExchange.Redis.RedisDatabase.KeyExpire(RedisKey key, Nullable`1 expiry, CommandFlags flags) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisDatabase.cs:ligne 420
   à test.Program.Main(String[] args) dans test\Program.cs:ligne 62

Here is the verbose output of the Redis client :

Read sockets: 1
More bytes available: 29 (29)
Error: MOVED 6918 10.9.28.74:6379
Matching result...
Response to: [0]:EXPIRE test (BooleanProcessor)
Now pending: 1
Requesting write: 10.9.28.74:6379/Interactive
Writing queue from bridge
Now pending: 0
Writing: [0]:EXPIRE test (BooleanProcessor)
EXPIRE re-issued to 10.9.28.74:6379
Processed: 1
Buffer exhausted
Write failed: Command cannot be issued to a slave: EXPIRE test
Pulsed
Completed synchronously: [0]:EXPIRE test (BooleanProcessor)

It looks like the client holds a sort of static list of servers with their associated running modes ; which is not updated after a failover.
Thanks for your help.

@actionthomas
Copy link
Author

The issue is still present in version 1.1.553 alpha.

@NickCraver
Copy link
Collaborator

Ok gotcha - looks like the topology isn't updating properly on cluster changes which may explain a few bugs - we'll take a look. We don't have a cluster readily setup due to not using it at Stack, so it may be a bit before we can test.

@NickCraver
Copy link
Collaborator

This is likely also solved via #314 and the latest commit. We'll try and get a new package up this week or next...still going through many of the issues in the backlog.

@thethomp
Copy link

I appear to still be hitting this bug (specifically for reads - HGET/HGETALL) when I CLUSTER FAILOVER manually. Has anyone confirmed if this is fixed in any recent version? I'm currently using 1.1.608.

@NickCraver
Copy link
Collaborator

Lots of changes since this was filed addressing the issue. If anyone's still seeing this in 1.2.6 or later, please ping and I'll re-open. Right now I'm unable to produce that and haven't gotten any reports on recent versions.

@Efp95
Copy link

Efp95 commented Mar 18, 2019

@NickCraver Hi, doing a similar scenario I got the same error for ZADD command

@zvolkov
Copy link

zvolkov commented Dec 3, 2019

We get this error in Production as well as in the lower environments. We're running 2.0.601 against Azure Redis Cache.

It happens only a few times a month, every episode lasting no more than 30 seconds. Mostly happens on DEL commands, sometimes on PEXPIRE and UNLINK. Looking at the app logs I don't see anything interesting on our side leading up to this, so it seems to be entirely induced by something happening on the Azure Redis Cache side.

Any ideas how I can further troubleshoot this, what else I can look at?

StackExchange.Redis.RedisCommandException: Command cannot be issued to a slave: DEL 4 [key] at StackExchange.Redis.ConnectionMultiplexer.PrepareToPushMessageToBridge[T](Message message, ResultProcessor`1 processor, IResultBox`1 resultBox, ServerEndPoint& server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 1982 at StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImpl[T](Message message, ResultProcessor`1 processor, Object state, ServerEndPoint server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2139 at StackExchange.Redis.RedisBase.ExecuteAsync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisBase.cs:line 47 at StackExchange.Redis.RedisDatabase.KeyDeleteAsync(RedisKey key, CommandFlags flags) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisDatabase.cs:line 609

@bryanrideshark
Copy link

I also am getting this error, also from the Azure Redis Cache implementation.

@mlstubblefield
Copy link

I'm getting this error in version 2.1.58 with the error message...
Command cannot be issued to a replica: UNLINK [key]

@NickCraver
Copy link
Collaborator

A lot was improved here in the 2.2.50 release - if you can please try the latest as it got a lot of love specifically on connect and reconnect scenarios.

@dscaravaggi
Copy link

A lot was improved here in the 2.2.50 release - if you can please try the latest as it got a lot of love specifically on connect and reconnect scenarios.

Yesterday I upgraded nuget to 2.2.60 release and today we scheduled a node failover:

RedisConnectionException: InternalFailure on UNLINK StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: UNLINK 
   at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1266
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2849
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 54
   at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 609
   at SmartServices.Core.Services.Cache.Imp.RedisCacheService.SafeInvalidateCache(String cacheKey) in C:\BuildAgent\work\ee7faa20632e72c2\SmartServices.Core\Services\Cache\Imp\RedisCacheService.cs:line 114

maybe there is some additional conf setting for transparent failover

@ecobisi
Copy link

ecobisi commented Jan 16, 2023

@NickCraver Unfortunately, the issue is still there with the latest version 2.6.90.

Here is the stack trace from a recent exception we have observed, immediately after an automatic fail-over which happened on a cluster running Redis 7.0.5:

StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX foo (BooleanProcessor)
---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX foo
at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1374
--- End of inner exception stack trace ---

Ideally, a dedicated exception could help detecting this kind of issue on the caller side more easily and possibly triggering a cluster nodes map update somehow.

Thanks for your hard work on SE.Redis!

@Wyzzcow
Copy link

Wyzzcow commented Jun 1, 2023

Hi,
Indeed this is still an issue even at 2.6.111.
What is the recommended way of handling cluster failover?
It is trivial to reproduce locally with docker if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants