"Command cannot be issued to a slave" after a Redis cluster failover #282

actionthomas · 2015-09-21T10:16:04Z

I'm currently testing failover with the latest client version (1.0.481).
When Redis performs a failover (for testing purposes, I'm issuing the "cluster failover" command manually), if I try to make a Write operation, the client throws a MasterOnly exception.
But on the Redis side, master and slaves have changed.

My setup : a 6 instances redis cluster.

127.0.0.1:6379> cluster nodes
4b1bc1008e8efae3b34a71a9370d544d3c08caea 10.9.28.73:6381 master - 0 144282823222
8 25 connected 5461-10922
60c6a00e1d72f91e3faee4206a1202fa653b78bd 10.9.28.74:6379 myself,slave 4b1bc1008e
8efae3b34a71a9370d544d3c08caea 0 0 24 connected
0b8378cca831e69118b066f78485cc5ee22a86d3 10.9.28.74:6380 slave bfe3865e346c5cee1
e6c754701569ed45423b114 0 1442828232556 4 connected
2943a1c1f78a60729ccb1b2c2f2fb96ad4600ede 10.9.28.73:6380 master - 0 144282823190
0 2 connected 10923-16383
bfe3865e346c5cee1e6c754701569ed45423b114 10.9.28.73:6379 master - 0 144282823146
4 1 connected 0-5460
1e7ce773ae14cba010b0d6e5390fd9e7305f40af 10.9.28.74:6381 slave 2943a1c1f78a60729
ccb1b2c2f2fb96ad4600ede 0 1442828232992 6 connected

The C# test client :

private static void Main(string[] args)
{
            var mpx = ConnectionMultiplexer.Connect("10.9.28.73:6379,10.9.28.73:6380,10.9.28.74:6379,10.9.28.74:6380,10.9.28.73:6381,10.9.28.74:6381", Console.Out);
            var db = mpx.GetDatabase();
            db.StringSet("test", "value!");

             while (true)
            {
                try
                {
                    db = mpx.GetDatabase();
                    db.KeyExpire("test", TimeSpan.FromMinutes(1));
                }
                catch (Exception ex)
                {
                    Console.Write(ex);
                }
            }
}

After issuing the failover command, here are the nodes :

127.0.0.1:6379> cluster nodes
4b1bc1008e8efae3b34a71a9370d544d3c08caea 10.9.28.73:6381 slave 60c6a00e1d72f91e3
faee4206a1202fa653b78bd 0 1442829978122 28 connected
60c6a00e1d72f91e3faee4206a1202fa653b78bd 10.9.28.74:6379 myself,master - 0 0 28
connected 5461-10922
0b8378cca831e69118b066f78485cc5ee22a86d3 10.9.28.74:6380 slave bfe3865e346c5cee1
e6c754701569ed45423b114 0 1442829977686 4 connected
2943a1c1f78a60729ccb1b2c2f2fb96ad4600ede 10.9.28.73:6380 master - 0 144282997845
0 2 connected 10923-16383
bfe3865e346c5cee1e6c754701569ed45423b114 10.9.28.73:6379 master - 0 144282997801
3 1 connected 0-5460
1e7ce773ae14cba010b0d6e5390fd9e7305f40af 10.9.28.74:6381 slave 2943a1c1f78a60729
ccb1b2c2f2fb96ad4600ede 0 1442829979214 6 connected

(10.9.28.74:6379 is now a master).
The client catches the exception

StackExchange.Redis.RedisConnectionException: ProtocolFailure on EXPIRE ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a slave: EXPIRE test
   à StackExchange.Redis.PhysicalBridge.WriteMessageToServer(PhysicalConnection connection, Message message) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\PhysicalBridge.cs:ligne 766
   --- Fin de la trace de la pile d'exception interne ---
   à StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:ligne 1935
   à StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisBase.cs:ligne 80
   à StackExchange.Redis.RedisDatabase.KeyExpire(RedisKey key, Nullable`1 expiry, CommandFlags flags) dans C:\Repos\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\RedisDatabase.cs:ligne 420
   à test.Program.Main(String[] args) dans test\Program.cs:ligne 62

Here is the verbose output of the Redis client :

Read sockets: 1
More bytes available: 29 (29)
Error: MOVED 6918 10.9.28.74:6379
Matching result...
Response to: [0]:EXPIRE test (BooleanProcessor)
Now pending: 1
Requesting write: 10.9.28.74:6379/Interactive
Writing queue from bridge
Now pending: 0
Writing: [0]:EXPIRE test (BooleanProcessor)
EXPIRE re-issued to 10.9.28.74:6379
Processed: 1
Buffer exhausted
Write failed: Command cannot be issued to a slave: EXPIRE test
Pulsed
Completed synchronously: [0]:EXPIRE test (BooleanProcessor)

It looks like the client holds a sort of static list of servers with their associated running modes ; which is not updated after a failover.
Thanks for your help.

The text was updated successfully, but these errors were encountered:

actionthomas · 2016-01-26T10:57:48Z

The issue is still present in version 1.1.553 alpha.

NickCraver · 2016-01-26T11:07:03Z

Ok gotcha - looks like the topology isn't updating properly on cluster changes which may explain a few bugs - we'll take a look. We don't have a cluster readily setup due to not using it at Stack, so it may be a bit before we can test.

NickCraver · 2016-01-28T03:31:16Z

This is likely also solved via #314 and the latest commit. We'll try and get a new package up this week or next...still going through many of the issues in the backlog.

thethomp · 2017-01-27T17:52:05Z

I appear to still be hitting this bug (specifically for reads - HGET/HGETALL) when I CLUSTER FAILOVER manually. Has anyone confirmed if this is fixed in any recent version? I'm currently using 1.1.608.

NickCraver · 2018-06-01T23:14:54Z

Lots of changes since this was filed addressing the issue. If anyone's still seeing this in 1.2.6 or later, please ping and I'll re-open. Right now I'm unable to produce that and haven't gotten any reports on recent versions.

Efp95 · 2019-03-18T18:25:22Z

@NickCraver Hi, doing a similar scenario I got the same error for ZADD command

zvolkov · 2019-12-03T19:11:57Z

We get this error in Production as well as in the lower environments. We're running 2.0.601 against Azure Redis Cache.

It happens only a few times a month, every episode lasting no more than 30 seconds. Mostly happens on DEL commands, sometimes on PEXPIRE and UNLINK. Looking at the app logs I don't see anything interesting on our side leading up to this, so it seems to be entirely induced by something happening on the Azure Redis Cache side.

Any ideas how I can further troubleshoot this, what else I can look at?

StackExchange.Redis.RedisCommandException: Command cannot be issued to a slave: DEL 4 [key] at StackExchange.Redis.ConnectionMultiplexer.PrepareToPushMessageToBridge[T](Message message, ResultProcessor`1 processor, IResultBox`1 resultBox, ServerEndPoint& server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 1982 at StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImpl[T](Message message, ResultProcessor`1 processor, Object state, ServerEndPoint server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2139 at StackExchange.Redis.RedisBase.ExecuteAsync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisBase.cs:line 47 at StackExchange.Redis.RedisDatabase.KeyDeleteAsync(RedisKey key, CommandFlags flags) in C:\projects\stackexchange-redis\src\StackExchange.Redis\RedisDatabase.cs:line 609

bryanrideshark · 2021-04-06T22:42:49Z

I also am getting this error, also from the Azure Redis Cache implementation.

mlstubblefield · 2021-06-24T14:02:41Z

I'm getting this error in version 2.1.58 with the error message...
Command cannot be issued to a replica: UNLINK [key]

NickCraver · 2021-06-26T01:44:49Z

A lot was improved here in the 2.2.50 release - if you can please try the latest as it got a lot of love specifically on connect and reconnect scenarios.

dscaravaggi · 2021-07-15T15:03:07Z

A lot was improved here in the 2.2.50 release - if you can please try the latest as it got a lot of love specifically on connect and reconnect scenarios.

Yesterday I upgraded nuget to 2.2.60 release and today we scheduled a node failover:

RedisConnectionException: InternalFailure on UNLINK StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: UNLINK 
   at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1266
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2849
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 54
   at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 609
   at SmartServices.Core.Services.Cache.Imp.RedisCacheService.SafeInvalidateCache(String cacheKey) in C:\BuildAgent\work\ee7faa20632e72c2\SmartServices.Core\Services\Cache\Imp\RedisCacheService.cs:line 114

maybe there is some additional conf setting for transparent failover

ecobisi · 2023-01-16T07:56:42Z

@NickCraver Unfortunately, the issue is still there with the latest version 2.6.90.

Here is the stack trace from a recent exception we have observed, immediately after an automatic fail-over which happened on a cluster running Redis 7.0.5:

StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:SETEX foo (BooleanProcessor)
---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: SETEX foo
at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1374
--- End of inner exception stack trace ---

Ideally, a dedicated exception could help detecting this kind of issue on the caller side more easily and possibly triggering a cluster nodes map update somehow.

Thanks for your hard work on SE.Redis!

Wyzzcow · 2023-06-01T15:13:08Z

Hi,
Indeed this is still an issue even at 2.6.111.
What is the recommended way of handling cluster failover?
It is trivial to reproduce locally with docker if that helps.

NickCraver added 🪲 bug ⚙️ area:cluster labels Jan 26, 2016

NickCraver added the ⚙️ area:failover label Nov 21, 2016

NickCraver closed this as completed Jun 1, 2018

Efp95 mentioned this issue Mar 18, 2019

RedisTimeoutException when Cluster loses a node #1091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Command cannot be issued to a slave" after a Redis cluster failover #282

"Command cannot be issued to a slave" after a Redis cluster failover #282

actionthomas commented Sep 21, 2015

actionthomas commented Jan 26, 2016

NickCraver commented Jan 26, 2016

NickCraver commented Jan 28, 2016

thethomp commented Jan 27, 2017

NickCraver commented Jun 1, 2018

Efp95 commented Mar 18, 2019

zvolkov commented Dec 3, 2019

bryanrideshark commented Apr 6, 2021

mlstubblefield commented Jun 24, 2021

NickCraver commented Jun 26, 2021

dscaravaggi commented Jul 15, 2021

ecobisi commented Jan 16, 2023

Wyzzcow commented Jun 1, 2023

Navigation Menu

"Command cannot be issued to a slave" after a Redis cluster failover #282

"Command cannot be issued to a slave" after a Redis cluster failover #282

Comments

actionthomas commented Sep 21, 2015

actionthomas commented Jan 26, 2016

NickCraver commented Jan 26, 2016

NickCraver commented Jan 28, 2016

thethomp commented Jan 27, 2017

NickCraver commented Jun 1, 2018

Efp95 commented Mar 18, 2019

zvolkov commented Dec 3, 2019

bryanrideshark commented Apr 6, 2021

mlstubblefield commented Jun 24, 2021

NickCraver commented Jun 26, 2021

dscaravaggi commented Jul 15, 2021

ecobisi commented Jan 16, 2023

Wyzzcow commented Jun 1, 2023