org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

richardtippl-ext90046 · 2024-05-10T14:43:57Z

Spring Boot version: 3.2.5
Java version: 21 (also tested on 17) NOT using virtual threads
Redis version: 7.2.4
Redission version: 3.30.0 (tested on 3.29.0, 3.28.0, 3.27.2)
Redisson configuration:

clusterServersConfig:
    password: ...
    nodeAddresses:
        - "redis://...:6379" # 6 addresses
    readMode: "MASTER"
    subscriptionMode: "SLAVE"
    masterConnectionMinimumIdleSize: 24
    masterConnectionPoolSize: 64
    slaveConnectionMinimumIdleSize: 24
    slaveConnectionPoolSize: 64
checkLockSyncedSlaves: false

We run a 6 node redis cluster in kubernetes. When performing a rolling restart of all nodes one by one, after restarting the final node, redisson fails to properly reconnect for the case of Redis health check in Spring Boot.

Expected behavior
Rollout finishes and redisson continues communicating with newly started nodes.

Actual behavior
After rollout finishes, redisson starts spamming WriteRedisConnectionException and CancellationException until the application instance is restarted.
Truncated log: app.log

Steps to reproduce or test case
Spring boot application using Redisson with enabled redis health check on application readiness probe.
Perform a rollout of all 6 redis cluster nodes.

As far as I can tell, this behavior started between version 3.27.2 and 3.28.0.
3.27.2 is OK, any version afterwards is NOK.

The text was updated successfully, but these errors were encountered:

hongkong9771 · 2024-05-23T08:47:43Z

Spring Boot version: 3.2.5 Java version: 21 (also tested on 17) NOT using virtual threads Redis version: 7.2.4 Redission version: 3.30.0 (tested on 3.29.0, 3.28.0, 3.27.2) Redisson configuration:

We run a 6 node redis cluster in kubernetes. When performing a rolling restart of all nodes one by one, after restarting the final node, redisson fails to properly reconnect for the case of Redis health check in Spring Boot.

Expected behavior Rollout finishes and redisson continues communicating with newly started nodes.

Actual behavior After rollout finishes, redisson starts spamming WriteRedisConnectionException and CancellationException until the application instance is restarted. Truncated log: app.log

Steps to reproduce or test case Spring boot application using Redisson with enabled redis health check on application readiness probe. Perform a rollout of all 6 redis cluster nodes.

As far as I can tell, this behavior started between version 3.27.2 and 3.28.0. 3.27.2 is OK, any version afterwards is NOK.

I tried version 3.27.2, but it didn't work either.

richardtippl-ext90046 · 2024-05-23T10:35:21Z

I tried version 3.27.2, but it didn't work either.

Odd, our cases seem to be a bit different. The logs do complain about CPU, but the actual CPU used by the app is still the same.
(the instance i'm testing on is otherwise idle, no requests besides health pings)
Also in our case the issue seems to be "solved" by rolling back to 3.27.2, but we cannot update to any newer.

mrniko · 2024-05-30T06:20:51Z

@richardtippl-ext90046

ERROR org.redisson.command.RedisExecutor : org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Check CPU usage of the JVM. Try to increase nettyThreads setting. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@999038644 [redisClient=[addr=redis://100.96.94.107:6379], channel=[id: 0x5688a424, L:/100.96.166.134:42194 ! R:100.96.94.107/100.96.94.107:6379], currentCommand=null, usage=1], command: (CLUSTER INFO), params: [] after 3 retry attempts

Does this error appear for CLUSTER INFO command only?

Is 100.96.94.107:6379 offline?

mrniko · 2024-05-30T07:14:45Z

Can anyone try the attached version?

redisson-3.30.1-SNAPSHOT.jar.zip

…5857

richardtippl-ext90046 · 2024-05-30T07:53:47Z

@richardtippl-ext90046

ERROR org.redisson.command.RedisExecutor : org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Check CPU usage of the JVM. Try to increase nettyThreads setting. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@999038644 [redisClient=[addr=redis://100.96.94.107:6379], channel=[id: 0x5688a424, L:/100.96.166.134:42194 ! R:100.96.94.107/100.96.94.107:6379], currentCommand=null, usage=1], command: (CLUSTER INFO), params: [] after 3 retry attempts

Does this error appear for CLUSTER INFO command only?

Is 100.96.94.107:6379 offline?

I can't really comment on other commands, because when the instance cannot pass the above health check, it stops accepting other requests.

As for 100.96.94.107:6379, in k8s platform rebooting an instance starts it up again with a different IP. Other redis nodes update their own ip for the node, but post 3.27 Redisson seems to not update the IP and ends up trying to ping a node that is no longer even present.

I'll try to test with the version above.

richardtippl-ext90046 · 2024-05-30T10:52:08Z

@mrniko

One thing to mention, we have Redis configured in a way that before termination, if the instance is a master node
it first asks it's corresponding slave node to failover, so that the master node is not the one being terminated.
This configuration has been disabled in the above log.

My first test with 3.30.1-SNAPSHOT has been with this enabled
Health checks didn't fail, but Redisson complains a lot about duplicated keys for nodes
this stops once restarts are over
app2.log

Second test is once again with this configuration turned off
Once a master node is restarted, Redisson starts once again complaining about connections to it
but not forever as before, once the node reboots or is failed over, the errors stop and health checks pass
Complaints about duplicated keys are also present
app3.log

mrniko · 2024-05-31T05:22:54Z

@richardtippl-ext90046

Thanks much for testing. Please try the new build below.

redisson-3.30.1-SNAPSHOT.jar.zip

This reverts commit c86af4e.

richardtippl-ext90046 · 2024-05-31T11:19:59Z

@mrniko
Performed 2 retests once more
There's no traffic on the instance besides the health pings

First test with the role switching before shutdown disabled
Rollout started well, during it there were some health failures
but those are expected if you terminate a master node (why we do the switches)
once nodes failed over or rebooted, all health pings succeed since
BUT if after the reboot finished i trigger a second reboot, redisson immediately fails to connect to it and falls into the error loop once again
app4.log

Second with the special failover enabled
Redisson seems to have correctly failed over and there have been no health failures
Also no complaints about duplicate keys
BUT once more, initiating a second rollout leads to a broken instance that isn't able to recover itself
app5.log

mrniko · 2024-05-31T17:09:32Z

Please try the new build. I added more info output regarding cluster nodes which reported exception.

redisson-3.31.1-SNAPSHOT.jar.zip

hongkong9771 mentioned this issue May 23, 2024

redisson netty consumes high cpu #5897

Open

sclasen mentioned this issue May 29, 2024

((RedisCluster) redissonClient.getRedisNodes(redisNodes)).getMasters() does not appear to track changes to cluster topology, should it? #5899

Closed

mrniko added this to the 3.30.1 milestone May 29, 2024

mrniko pushed a commit that referenced this issue May 30, 2024

Fixed - cluster topology scan shouldn't be stopped by any exception. #…

4e6c0be

…5857

mrniko pushed a commit that referenced this issue May 30, 2024

Fixed - Cluster failover handling #5857

c86af4e

mrniko added the bug label May 30, 2024

mrniko pushed a commit that referenced this issue May 31, 2024

Revert "Fixed - Cluster failover handling #5857"

7b2c189

This reverts commit c86af4e.

mrniko pushed a commit that referenced this issue May 31, 2024

Fixed - Cluster failover handling #5857

4bf4cb1

mrniko modified the milestones: 3.31.0, 3.31.1 May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

richardtippl-ext90046 commented May 10, 2024

hongkong9771 commented May 23, 2024

richardtippl-ext90046 commented May 23, 2024

mrniko commented May 30, 2024 •

edited

mrniko commented May 30, 2024

richardtippl-ext90046 commented May 30, 2024

richardtippl-ext90046 commented May 30, 2024

mrniko commented May 31, 2024

richardtippl-ext90046 commented May 31, 2024

mrniko commented May 31, 2024 •

edited

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

Comments

richardtippl-ext90046 commented May 10, 2024

hongkong9771 commented May 23, 2024

richardtippl-ext90046 commented May 23, 2024

mrniko commented May 30, 2024 • edited

mrniko commented May 30, 2024

richardtippl-ext90046 commented May 30, 2024

richardtippl-ext90046 commented May 30, 2024

mrniko commented May 31, 2024

richardtippl-ext90046 commented May 31, 2024

mrniko commented May 31, 2024 • edited

mrniko commented May 30, 2024 •

edited

mrniko commented May 31, 2024 •

edited