Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! After restarting redis cluster. #5857

Open
richardtippl-ext90046 opened this issue May 10, 2024 · 9 comments
Labels
Milestone

Comments

@richardtippl-ext90046
Copy link

Spring Boot version: 3.2.5
Java version: 21 (also tested on 17) NOT using virtual threads
Redis version: 7.2.4
Redission version: 3.30.0 (tested on 3.29.0, 3.28.0, 3.27.2)
Redisson configuration:

clusterServersConfig:
    password: ...
    nodeAddresses:
        - "redis://...:6379" # 6 addresses
    readMode: "MASTER"
    subscriptionMode: "SLAVE"
    masterConnectionMinimumIdleSize: 24
    masterConnectionPoolSize: 64
    slaveConnectionMinimumIdleSize: 24
    slaveConnectionPoolSize: 64
checkLockSyncedSlaves: false

We run a 6 node redis cluster in kubernetes. When performing a rolling restart of all nodes one by one, after restarting the final node, redisson fails to properly reconnect for the case of Redis health check in Spring Boot.

Expected behavior
Rollout finishes and redisson continues communicating with newly started nodes.

Actual behavior
After rollout finishes, redisson starts spamming WriteRedisConnectionException and CancellationException until the application instance is restarted.
Truncated log: app.log

Steps to reproduce or test case
Spring boot application using Redisson with enabled redis health check on application readiness probe.
Perform a rollout of all 6 redis cluster nodes.

As far as I can tell, this behavior started between version 3.27.2 and 3.28.0.
3.27.2 is OK, any version afterwards is NOK.

@hongkong9771
Copy link

Spring Boot version: 3.2.5 Java version: 21 (also tested on 17) NOT using virtual threads Redis version: 7.2.4 Redission version: 3.30.0 (tested on 3.29.0, 3.28.0, 3.27.2) Redisson configuration:

We run a 6 node redis cluster in kubernetes. When performing a rolling restart of all nodes one by one, after restarting the final node, redisson fails to properly reconnect for the case of Redis health check in Spring Boot.

Expected behavior Rollout finishes and redisson continues communicating with newly started nodes.

Actual behavior After rollout finishes, redisson starts spamming WriteRedisConnectionException and CancellationException until the application instance is restarted. Truncated log: app.log

Steps to reproduce or test case Spring boot application using Redisson with enabled redis health check on application readiness probe. Perform a rollout of all 6 redis cluster nodes.

As far as I can tell, this behavior started between version 3.27.2 and 3.28.0. 3.27.2 is OK, any version afterwards is NOK.

I tried version 3.27.2, but it didn't work either.

@richardtippl-ext90046
Copy link
Author

I tried version 3.27.2, but it didn't work either.

Odd, our cases seem to be a bit different. The logs do complain about CPU, but the actual CPU used by the app is still the same.
(the instance i'm testing on is otherwise idle, no requests besides health pings)
Also in our case the issue seems to be "solved" by rolling back to 3.27.2, but we cannot update to any newer.

@mrniko
Copy link
Member

mrniko commented May 30, 2024

@richardtippl-ext90046

ERROR org.redisson.command.RedisExecutor : org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Check CPU usage of the JVM. Try to increase nettyThreads setting. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@999038644 [redisClient=[addr=redis://100.96.94.107:6379], channel=[id: 0x5688a424, L:/100.96.166.134:42194 ! R:100.96.94.107/100.96.94.107:6379], currentCommand=null, usage=1], command: (CLUSTER INFO), params: [] after 3 retry attempts

Does this error appear for CLUSTER INFO command only?

Is 100.96.94.107:6379 offline?

@mrniko
Copy link
Member

mrniko commented May 30, 2024

Can anyone try the attached version?

redisson-3.30.1-SNAPSHOT.jar.zip

@richardtippl-ext90046
Copy link
Author

@richardtippl-ext90046

ERROR org.redisson.command.RedisExecutor : org.redisson.client.WriteRedisConnectionException: Unable to write command into connection! Check CPU usage of the JVM. Try to increase nettyThreads setting. Node source: NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null], connection: RedisConnection@999038644 [redisClient=[addr=redis://100.96.94.107:6379], channel=[id: 0x5688a424, L:/100.96.166.134:42194 ! R:100.96.94.107/100.96.94.107:6379], currentCommand=null, usage=1], command: (CLUSTER INFO), params: [] after 3 retry attempts

Does this error appear for CLUSTER INFO command only?

Is 100.96.94.107:6379 offline?

I can't really comment on other commands, because when the instance cannot pass the above health check, it stops accepting other requests.

As for 100.96.94.107:6379, in k8s platform rebooting an instance starts it up again with a different IP. Other redis nodes update their own ip for the node, but post 3.27 Redisson seems to not update the IP and ends up trying to ping a node that is no longer even present.

I'll try to test with the version above.

@richardtippl-ext90046
Copy link
Author

@mrniko

One thing to mention, we have Redis configured in a way that before termination, if the instance is a master node
it first asks it's corresponding slave node to failover, so that the master node is not the one being terminated.
This configuration has been disabled in the above log.

My first test with 3.30.1-SNAPSHOT has been with this enabled
Health checks didn't fail, but Redisson complains a lot about duplicated keys for nodes
this stops once restarts are over
app2.log

Second test is once again with this configuration turned off
Once a master node is restarted, Redisson starts once again complaining about connections to it
but not forever as before, once the node reboots or is failed over, the errors stop and health checks pass
Complaints about duplicated keys are also present
app3.log

@mrniko
Copy link
Member

mrniko commented May 31, 2024

@richardtippl-ext90046

Thanks much for testing. Please try the new build below.

redisson-3.30.1-SNAPSHOT.jar.zip

mrniko pushed a commit that referenced this issue May 31, 2024
mrniko pushed a commit that referenced this issue May 31, 2024
@richardtippl-ext90046
Copy link
Author

@mrniko
Performed 2 retests once more
There's no traffic on the instance besides the health pings

First test with the role switching before shutdown disabled
Rollout started well, during it there were some health failures
but those are expected if you terminate a master node (why we do the switches)
once nodes failed over or rebooted, all health pings succeed since
BUT if after the reboot finished i trigger a second reboot, redisson immediately fails to connect to it and falls into the error loop once again
app4.log

Second with the special failover enabled
Redisson seems to have correctly failed over and there have been no health failures
Also no complaints about duplicate keys
BUT once more, initiating a second rollout leads to a broken instance that isn't able to recover itself
app5.log

@mrniko mrniko modified the milestones: 3.31.0, 3.31.1 May 31, 2024
@mrniko
Copy link
Member

mrniko commented May 31, 2024

Please try the new build. I added more info output regarding cluster nodes which reported exception.

redisson-3.31.1-SNAPSHOT.jar.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants