RedisCommandException SETEX Errors on Azure Redis Cluster #1504

tonycoelho · 2020-06-18T14:55:51Z

StackExchange.Redis version 2.1.30

This issue is related to #1172 where SETEX errors are occurring while trying to write to a replica/slave while a master node is in a failover scenario on Azure Redis with clustering enabled. The steps to reproduce this issue are as follows:

Enable clustering on an Azure Redis instance configured with 2 shards
Start an application/script that writes to cache, reads from cache, and deletes from cache on a recurring loop
Reboot the master node one of the clustered shards, i.e. Shard0

This results in the following transient exceptions:
message=The PutItemAsync operation failed. Attempts: 6 Duration: 00:00:00.0018099 Exception: InternalFailure on SETEX RedisCacheCheckKeyd206b88c-1760-4e4b-a719-7aebcf7b41b3. exception=StackExchange.Redis.RedisConnectionException: InternalFailure on SETEX RedisCacheCheckKeyd206b88c-1760-4e4b-a719-7aebcf7b41b3
---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a slave: SETEX RedisCacheCheckKeyd206b88c-1760-4e4b-a719-7aebcf7b41b3
at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in //src/StackExchange.Redis/PhysicalBridge.cs:line 1303_

In this specific case, the application is executing the write operation with resiliency and retrying any exceptions that occur. The write operation ultimately failed after 6 attempts. Once the reboot of the master node on Shard0 completed, all operations continued to succeed again.

Per #1172, this was supposedly fixed in release 2.1.30, but we are still seeing this issue. Any guidance on how best to handle this error would be appreicated.

NickCraver · 2020-06-20T22:59:22Z

I'm not entirely sure what's being asked here - is the expectation that no errors occur while a failover executes? Or some interim state? Or is the issue that the target node was not a replica during the reboot, and was promoted to primary? I'm not sure what the in-between state is in Azure's setup.

tonycoelho · 2020-06-21T00:30:13Z

Hey @NickCraver the question is; when the master node is down on a shard in a clustering configuration, why do writes to the slave/replica fail and why doesn't it try to use a master node on a different shard in the cluster for better resiliency? I'm trying to understand what we can do that make the system more resilient in cases like this i.e. when a node is failing or being patched in Azure. Using a resiliency pattern like retry exponential back off doesn't help because writes continue to fail until the master node is healthy again.

NickCraver · 2022-02-06T03:05:10Z

@tonycoelho There was a lot of digging here and I didn't update this issue but one change we made was proactively recognizing topology changes in Azure when maintenance events happen. This was added in #1876 which should dramatically improve the recognition time for changes here.

Happy to reopen if this is still an issue, but overall: grab the latest client and it'll have the changes from #1876 to better handle this!

NickCraver added the ☁️ platform:Azure label May 20, 2021

NickCraver closed this as completed Feb 6, 2022

MarcinFrankowski mentioned this issue Jul 7, 2022

RedisCommandException during Azure Cluster failover #2181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedisCommandException SETEX Errors on Azure Redis Cluster #1504

RedisCommandException SETEX Errors on Azure Redis Cluster #1504

tonycoelho commented Jun 18, 2020

NickCraver commented Jun 20, 2020

tonycoelho commented Jun 21, 2020

NickCraver commented Feb 6, 2022

RedisCommandException SETEX Errors on Azure Redis Cluster #1504

RedisCommandException SETEX Errors on Azure Redis Cluster #1504

Comments

tonycoelho commented Jun 18, 2020

StackExchange.Redis version 2.1.30

NickCraver commented Jun 20, 2020

tonycoelho commented Jun 21, 2020

NickCraver commented Feb 6, 2022