New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart Action Cable when Redis reconnect_attempt limit is reached #51687
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the ActionCable reconnect_attempt limit is reached, the server wedges and continues to accept connections but cannot deliver messages since Redis is disconnected
Oh, that's definitely a bug; thanks for catching!
The benefit of doing a restart is that it announces to the clients that the server is restarting
Yeah, but restarting Action Cable doesn't do anything with the application state itself. What could be the cause of Redis connectivity issues? If we assume that other instances are doing well (and that's why we hope for clients to re-connect to them) then it probably means that something is wrong with the current instance configuration or networking. Hence, restarting Action Cable wouldn't help—this instance would still be broken (and non-detectable by health-checks, though this is a different topic).
What I'm thinking about here is if we should crash the whole process instead (Kernel.exit
or whatever) instead of doing a virtual restart?
ActionCable.server.logger.info("Redis reconnect_attempts limit exceeded; restarting.") | ||
@thread = @raw_client = nil | ||
ActionCable.server.restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use singleton here (there can be multiple servers, e.g., managed by third-party gems). Every subscription adapter keeps a reference to its server in the @server
instance variable (server
reader is available). We should use it here.
Also, no need to reset @thread
, @raw_client
instance variables—the whole pubsub object is re-created on restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server reader is available). We should use it here.
👍 I think we can change this to @adapter.server.restart
Also, no need to reset @thread, @raw_client instance variables—the whole pubsub object is re-created on restart.
I think you are correct about @raw_client
. But we do need to reset @thread
because shutdown
tries to unsubscribe if @thread
is not nil, and that won't succeed if Redis is down.
That is an option, though it is nice to send a message to clients ( |
7233f25
to
43ba41d
Compare
43ba41d
to
9f547ec
Compare
@palkan and others, do you have any more feedback on this PR? |
Motivation / Background
This Pull Request has been created because when the ActionCable reconnect_attempt limit is reached, the server wedges and continues to accept connections but cannot deliver messages since Redis is disconnected.
Detail
This is based on #45478 and #44626 and building on #46562. #46562 added a Redis reconnect option, but when the number of reconnect attempts is exceeded the server cannot currently recover.
This Pull Request changes the ActionCable Redis subscription adapter to restart the ActionCable server when the number of reconnect_attempts is reached. The benefit of doing a restart is that it announces to the clients that the server is restarting. Combined with a health check that takes the server out of service (in the event of an extended Redis outage), this can allow clients to migrate to a healthy server.
cc @palkan @MrChrisW @engwan
Additional information
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]