Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reconnect to Redis Cluster in Kubernetes after cluster rollout #56

Closed
zombiezen opened this issue Aug 12, 2022 · 12 comments
Closed

Comments

@zombiezen
Copy link
Contributor

Thanks for the library! We're trying to use this library in a server running in Kubernetes connecting to a Redis Cluster running in the same Kubernetes cluster. We're connecting to Redis using redis-cluster://my-redis-cluster:6379 as the RedisConfig with a RedisPool.

We noticed that when we go to roll out a new version of the Redis deployment that our fred-using server will hang trying to send commands to the Redis Cluster. The logs we're seeing are of the form:

Error creating or using backchannel for cluster nodes: Redis Error - kind: IO, details: Os { code: 110, kind: TimedOut, message: "Connection timed out" }
Failed to reconnect with error Redis Error - kind: Cluster, details: Failed to read cluster nodes on all possible backchannel servers.

Restarting our fred-using server with the same configuration is able to connect to the Redis Cluster just fine.

After adding some logging, it seems that what's happening is that the IP addresses on the cluster cycle out to different ones, but IIUC fred doesn't try to re-resolve the DNS address to reconnect to the cluster.

@aembke
Copy link
Owner

aembke commented Aug 15, 2022

Yeah, good find @zombiezen. This is one of the reasons why the next release will be a major release. The changes necessary to the connection management plumbing are too invasive to do in a patch or minor release, but they're required to fix this issue.

I'm about halfway through those changes now, and my goal is to release them by the end of the month or shortly after.

@aembke
Copy link
Owner

aembke commented Aug 15, 2022

Out of curiosity - would you find it useful to override DNS resolution logic? I've been debating whether to expose an interface for doing this (similar to hyper).

@zombiezen
Copy link
Contributor Author

Thanks! LMK if there's anything else I can do to help in a fix. We have some workarounds for this, but it would definitely save us some operational headaches to have it addressed.

AFAIK we don't need anything fancy for DNS resolution logic: the OS-provided resolver is fine for us.

@aembke
Copy link
Owner

aembke commented Aug 15, 2022

Sounds good. This is definitely a use case I plan on supporting, and I'll keep you updated on the status of the fix.

@zombiezen
Copy link
Contributor Author

Hey @aembke, any updates on this or any assistance we can provide? Not having this is causing us some grief in production.

@aembke
Copy link
Owner

aembke commented Sep 21, 2022

Yeah my apologies, getting back into it now. There's a couple PRs folks have submitted that I think are related to this. I'll take a look at the options and go with the fastest one that addresses this.

@casret
Copy link

casret commented Oct 26, 2022

@zombiezen Curious if you have any workaround in production?

@zombiezen
Copy link
Contributor Author

We've reverted to using the redis-rs library with a non-clustered Redis for the time being.

@sebastianhopkins-lh
Copy link

We're working on a new service in rust and have chosen fred for it; the main reason why it was chosen is that it's the only redis library that does async, pooling, and clustering (all at the same time). We're also running into this exact problem & looking forward to trying out the next major release, thanks all.

@aembke
Copy link
Owner

aembke commented Dec 12, 2022

Quick update to the folks on this thread - I just published 6.0.0-beta.1 to crates.io. It has an entirely new implementation of the cluster interface and the repros I had for this issue seem to work now with the new version. If you have any feedback on the new interface please let me know.

@sebastianhopkins-lh
Copy link

I've been using fred at this commit for a few weeks: 36798a2 and I'm not encountering this problem anymore, maybe @zombiezen can check too?

@lytefast
Copy link

I've been using fred at this commit for a few weeks: 36798a2 and I'm not encountering this problem anymore, maybe @zombiezen can check too?

I don't want to leave you hanging, but we've since reverted to a sharded REDIS system. I don't think will have time to verify this bug. At least not in our immediate roadmap.

Thanks for the patch though.

@aembke aembke closed this as completed Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants