Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to Redis due to EOF #694

Closed
zetaron opened this issue Jul 15, 2021 · 12 comments
Closed

Unable to connect to Redis due to EOF #694

zetaron opened this issue Jul 15, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@zetaron
Copy link
Contributor

zetaron commented Jul 15, 2021

When configuring a self-hosted kodiak bot to connect to a Redis Instance managed by DigitalOcean it fails with the following error message on repeat:

INFO asyncio_redis:connection.py:116 Connecting to redis
INFO asyncio_redis:protocol.py:897 Redis connection made
INFO asyncio_redis:protocol.py:955 EOF received in RedisProtocol
INFO asyncio_redis:protocol.py:979 Redis connection lost
ERROR asyncio:main.py:306 Task exception was never retrieved
future: <Task finished coro= exception=ConnectionLostError(None)>
Traceback (most recent call last):
  File "/var/app/.venv/src/asyncio-redis/asyncio_redis/protocol.py", line 911, in initialize
    yield from self.auth(self.password)
  File "/var/app/.venv/src/asyncio-redis/asyncio_redis/protocol.py", line 733, in wrapper
    result = yield from method(protocol_self, _NoTransaction, *a, **kw)
  File "/var/app/.venv/src/asyncio-redis/asyncio_redis/protocol.py", line 1231, in _query
    transaction, answer_f, _bypass=_bypass, call=call
  File "/var/app/.venv/src/asyncio-redis/asyncio_redis/protocol.py", line 1155, in _get_answer
    result = yield from answer_f
asyncio_redis.exceptions.ConnectionLostError: None

From the DigitalOcean FAQ I take that the EOF error would usualy happen when the connecting client does not support TLS.

Is there a way to use TLS or would that require switching to the official redis library?
If switching would be required, how hard would that be?
Would you be open to PRs?


How valuable is the data thats stored inside the redis?
Would it be bad to loose it?

We are hosting on Kubernetes and I could deploy redis in memory ... but that would be subject to potential rescheduling and movement accross cluster nodes.

@zetaron zetaron added the bug Something isn't working label Jul 15, 2021
@chdsbd
Copy link
Owner

chdsbd commented Jul 16, 2021

Hi @zetaron,

Thanks for opening this issue.

Kodiak uses a slightly modified fork of asyncio-redis, which doesn't have TLS support built in.

I think it's possible to update the library to support TLS. It uses the asyncio loop.create_connection, which supports TLS.

https://github.com/chdsbd/asyncio-redis/blob/7d174e0ebe2dc52340001097eeee48a868454480/asyncio_redis/connection.py#L118-L120

There are other asyncio Redis clients around, but I don't know how hard they would be to integrate. I'm open to PRs that add TLS support.

One alternative is to use stunnel to create a connection using TLS between hosts without needed to update the Redis client.

Another option is using something like Digital Ocean's private networking feature to isolate traffic.

Kodiak uses Redis to store merge queue positions. So if your Redis instance were to disappear, Kodiak would lose track of any PR waiting to be merged.

@zetaron
Copy link
Contributor Author

zetaron commented Jul 16, 2021

Thanks for your quick reply :)

I'll attempt to update your fork of the asyncio_redis library to support TLS, huge thanks for the pointers.

DigitalOceans private networking still (from what I see) requires TLS to connect.

@zetaron
Copy link
Contributor Author

zetaron commented Jul 16, 2021

I've created two PRs to implement TLS support:

I did test the change locally using docker-compose against my DigitalOcean managed Redis and the above error did not happen again.
Since I'm not too familiar with python and this project in general I was unable to add unit-tests. If that is something you want please feel free to modify my PRs :)

kodiakhq bot pushed a commit to chdsbd/asyncio-redis that referenced this issue Jul 16, 2021
In response to chdsbd/kodiak#694 this PR is the preparation for TLS support in kodiaks bot.
kodiakhq bot pushed a commit that referenced this issue Jul 16, 2021
In response to #694 this PR is the preparation for TLS support in kodiaks bot. 

This PR requires chdsbd/asyncio-redis#6 to be merged first.
@chdsbd
Copy link
Owner

chdsbd commented Jul 16, 2021

Thanks for the PRs!

You should be all set with Redis and TLS using the latest commit: f487e79

@chdsbd chdsbd closed this as completed Jul 16, 2021
@zetaron
Copy link
Contributor Author

zetaron commented Jul 17, 2021

Great :D thanks for the quick merge!

Would you mind also triggering a release for the docker image?

@sbdchd
Copy link
Collaborator

sbdchd commented Jul 17, 2021

CI automatically builds and releases a docker image with the latest commit SHA, so if you nav to https://hub.docker.com/r/cdignam/kodiak/tags, the most recently released version is cdignam/kodiak:f487e799a74ae074ee16c9106c9750925d329be4

@zetaron
Copy link
Contributor Author

zetaron commented Jul 17, 2021

Ah that slipped me when I checked.

@zetaron
Copy link
Contributor Author

zetaron commented Jul 18, 2021

Hi, again :)

I've deployed the new Image and after some time observed the EOF coming back but seemingly recovering.
Maybe you know of the top of your head how many locations there are where a redis connection gets created/recreated?
Or if the recreation of the connection is expected after it got used for handling an event?
I'm currently using a Pool Size of 10.

logs-from-server-in-kodiak-hlztb-deployment-6477778f9b-vwvx7.txt

@sbdchd sbdchd reopened this Jul 18, 2021
@sbdchd
Copy link
Collaborator

sbdchd commented Jul 18, 2021

I believe the EOF method on the asyncio.Protocol is called when the server sends an EOF

https://github.com/chdsbd/asyncio-redis/blob/388a6ac390ae70ee16bdc45cf19b308d11212566/asyncio_redis/protocol.py#L954-L956

https://docs.python.org/3.9/library/asyncio-protocol.html#asyncio.Protocol.eof_received

So my first thought is Redis is killing the connections after a while and the python client is recreating the connection when it dies

Do you have timestamps to determine how long after the initial pool creation the connections die? Might shed some light on if it's timeout related

@zetaron
Copy link
Contributor Author

zetaron commented Jul 18, 2021

I’ll see to get some, in the UI there are timestamps but the downloaded file seems to miss them.

Oddly enough from what I observed it was related to events being handled as it only happened after the first events started coming in no matter how quick or long that took.
But I’ll try to solidify that with some more log data maybe a longer timeframe to be able to spot recurrence.

@zetaron
Copy link
Contributor Author

zetaron commented Jul 19, 2021

After following your hint on timeouts and some research on redis timeouts in relation to DigitalOcean Managed Redis I found this Question on how to set it to 0 (disabled)

The default timeout for the managed Redis seems to be: 300

I've gathered some more logs, but apart from an overwhelming amount of the following lines occurring every 2-3 seconds apart, which now makes total sense since the default connection timeout is 300ms.


I'll try and have DigitalOcean that set to 0 on my kodiak instance by opening a Ticket with them.

Sorry for the comotion but maybe this helps someone else stumbling into similar issues :)

@zetaron
Copy link
Contributor Author

zetaron commented Aug 5, 2021

After DigitalOcean set the timeout to 0 the issue got resolved :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants