New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to connect to Redis due to EOF #694
Comments
Hi @zetaron, Thanks for opening this issue. Kodiak uses a slightly modified fork of asyncio-redis, which doesn't have TLS support built in. I think it's possible to update the library to support TLS. It uses the asyncio There are other asyncio Redis clients around, but I don't know how hard they would be to integrate. I'm open to PRs that add TLS support. One alternative is to use Another option is using something like Digital Ocean's private networking feature to isolate traffic. Kodiak uses Redis to store merge queue positions. So if your Redis instance were to disappear, Kodiak would lose track of any PR waiting to be merged. |
Thanks for your quick reply :) I'll attempt to update your fork of the DigitalOceans private networking still (from what I see) requires TLS to connect. |
I've created two PRs to implement TLS support: I did test the change locally using docker-compose against my DigitalOcean managed Redis and the above error did not happen again. |
In response to chdsbd/kodiak#694 this PR is the preparation for TLS support in kodiaks bot.
In response to #694 this PR is the preparation for TLS support in kodiaks bot. This PR requires chdsbd/asyncio-redis#6 to be merged first.
Thanks for the PRs! You should be all set with Redis and TLS using the latest commit: f487e79 |
Great :D thanks for the quick merge! Would you mind also triggering a release for the docker image? |
CI automatically builds and releases a docker image with the latest commit SHA, so if you nav to https://hub.docker.com/r/cdignam/kodiak/tags, the most recently released version is |
Ah that slipped me when I checked. |
Hi, again :) I've deployed the new Image and after some time observed the logs-from-server-in-kodiak-hlztb-deployment-6477778f9b-vwvx7.txt |
I believe the EOF method on the https://docs.python.org/3.9/library/asyncio-protocol.html#asyncio.Protocol.eof_received So my first thought is Redis is killing the connections after a while and the python client is recreating the connection when it dies Do you have timestamps to determine how long after the initial pool creation the connections die? Might shed some light on if it's timeout related |
I’ll see to get some, in the UI there are timestamps but the downloaded file seems to miss them. Oddly enough from what I observed it was related to events being handled as it only happened after the first events started coming in no matter how quick or long that took. |
After following your hint on timeouts and some research on redis timeouts in relation to DigitalOcean Managed Redis I found this Question on how to set it to 0 (disabled) The default timeout for the managed Redis seems to be: 300 I've gathered some more logs, but apart from an overwhelming amount of the following lines occurring every 2-3 seconds apart, which now makes total sense since the default connection timeout is 300ms. I'll try and have DigitalOcean that set to 0 on my kodiak instance by opening a Ticket with them. Sorry for the comotion but maybe this helps someone else stumbling into similar issues :) |
After DigitalOcean set the timeout to 0 the issue got resolved :) |
When configuring a self-hosted kodiak bot to connect to a Redis Instance managed by DigitalOcean it fails with the following error message on repeat:
From the DigitalOcean FAQ I take that the EOF error would usualy happen when the connecting client does not support TLS.
Is there a way to use TLS or would that require switching to the official redis library?
If switching would be required, how hard would that be?
Would you be open to PRs?
How valuable is the data thats stored inside the redis?
Would it be bad to loose it?
We are hosting on Kubernetes and I could deploy redis in memory ... but that would be subject to potential rescheduling and movement accross cluster nodes.
The text was updated successfully, but these errors were encountered: