Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart Action Cable when Redis reconnect_attempt limit is reached #51687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

buckelij
Copy link

Motivation / Background

This Pull Request has been created because when the ActionCable reconnect_attempt limit is reached, the server wedges and continues to accept connections but cannot deliver messages since Redis is disconnected.

Detail

This is based on #45478 and #44626 and building on #46562. #46562 added a Redis reconnect option, but when the number of reconnect attempts is exceeded the server cannot currently recover.

This Pull Request changes the ActionCable Redis subscription adapter to restart the ActionCable server when the number of reconnect_attempts is reached. The benefit of doing a restart is that it announces to the clients that the server is restarting. Combined with a health check that takes the server out of service (in the event of an extended Redis outage), this can allow clients to migrate to a healthy server.

cc @palkan @MrChrisW @engwan

Additional information

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Unrelated changes should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

Copy link
Contributor

@palkan palkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the ActionCable reconnect_attempt limit is reached, the server wedges and continues to accept connections but cannot deliver messages since Redis is disconnected

Oh, that's definitely a bug; thanks for catching!

The benefit of doing a restart is that it announces to the clients that the server is restarting

Yeah, but restarting Action Cable doesn't do anything with the application state itself. What could be the cause of Redis connectivity issues? If we assume that other instances are doing well (and that's why we hope for clients to re-connect to them) then it probably means that something is wrong with the current instance configuration or networking. Hence, restarting Action Cable wouldn't help—this instance would still be broken (and non-detectable by health-checks, though this is a different topic).

What I'm thinking about here is if we should crash the whole process instead (Kernel.exit or whatever) instead of doing a virtual restart?

Comment on lines 173 to 175
ActionCable.server.logger.info("Redis reconnect_attempts limit exceeded; restarting.")
@thread = @raw_client = nil
ActionCable.server.restart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use singleton here (there can be multiple servers, e.g., managed by third-party gems). Every subscription adapter keeps a reference to its server in the @server instance variable (server reader is available). We should use it here.

Also, no need to reset @thread, @raw_client instance variables—the whole pubsub object is re-created on restart.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server reader is available). We should use it here.

👍 I think we can change this to @adapter.server.restart

Also, no need to reset @thread, @raw_client instance variables—the whole pubsub object is re-created on restart.

I think you are correct about @raw_client. But we do need to reset @thread because shutdown tries to unsubscribe if @thread is not nil, and that won't succeed if Redis is down.

@buckelij
Copy link
Author

What I'm thinking about here is if we should crash the whole process instead (Kernel.exit or whatever) instead of doing a virtual restart?

That is an option, though it is nice to send a message to clients (connection.close(reason: ActionCable::INTERNAL[:disconnect_reasons][:server_restart])). The other downside of exiting abrubtly is that if the Rails process is also serving non-ActionCable traffic, that might be more disruptive than what we want. I think taking the process out of service can be the responsibility of an upstream load balancer (via a healthcheck).

@buckelij buckelij force-pushed the actioncable-restart branch 3 times, most recently from 7233f25 to 43ba41d Compare May 1, 2024 02:45
@buckelij
Copy link
Author

@palkan and others, do you have any more feedback on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants