-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout issues if running in docker container #2253
Comments
It's hard to tell from the demo because I see a few major issues in the demo itself:
That's the code issues, which I want to separate from the Redis usage - suggestions:
I know the above is a bit vague but the demo has a lot of internal issues with LOH arrays just from looking that it's hard to suggest targeted fixes there. But hopefully the above advice helps on direction. |
Thanks for the feedback. I will try to add some more information about the demo project to clarify the mentioned code issues.
Sure, but they are only there to print out the measurements. The timeout also occurs if all Console logging is gone, so it make no difference here and you can see some output while running it, which is helpful in the end. There is no Console logging in production code ;)
I heard you and I have changed it in my last commit. Every array allocation is now done only once at the beginning before anything starts.
Also done at the beginning now. :) But I have just followed the official Basic Usage guide which mentioned that this is a cheap pass-thru object, and does not need to be stored. ;-)
Even if it is not really in parallel, those blocks should demonstrate that from different threads such requests could happen. In the production code, we have a singleton redis cache service which get requests from different threads. So this parallel foreach is just for demo purposes there and should somehow "simulate" our scenario.
That's right, that's why we use buckets of 1000 keys already in the production code which could also be enabled in the demo project by setting the useBuckets variable to true. And using those buckets pushes the performance dramatically. But as mentioned, even in Docker it's much slower than on Windows.
Good point, didn't know that. This could be a future improvement. Thanks for this hint.
Thanks for this one, too. Also an improvement for the future :) I did not added it the the demo project in my last commit.
Thanks for all the information, even if it is a bit vague. :) Meanwhile I have tested it under Linux (natively) and it runs without any issues there, too. Just to be sure that's not an issue from the runtime under Linux. |
Wanted to follow-up here - were you able to determine anymore? The only advice I'd have past reading above is "get a profile in the docker container" which...I'm not an expert on. But we'd really need to observe what's happening. Is it GC contention? Core/heap issues? Bandwidth restrictions? Some system call overhead? It's such a different environment it's really hard to say anything other than profile and observe what's happening as the bottleneck. Given the LOH bits we saw I think you may be into GC style issues more so with Docker than on something exposing more cores for example. Tools like I wish I had more specific advice here but that's the best I can think of as next steps if still having issues here. |
Sadly I had not much time to dive deeper into that issue again. As I wrote it was working acceptable with the bucket stuff and it wasn't that urgent anymore. As soon as there is enough time again, I will try to analyze it as you suggested it. |
I've just migrated a web app into a Linux docker container and started seeing this error:
This error does not occur when the same web app is deployed to a Windows EC2 server. I'm going to test using the latest version tomorrow. We're seeing this error when using Redis as the Session provider. Environment I'd be happy to provide more info if needed. |
Finally I have some time to give some feedback again for this issue. It seems that we finally solved our issue. The final solution was to increase the min threads in the thread pool and to use the ThreadPool instead the shared one for the SocketManager. With this change the timeouts are gone and it's working like a charm right now. I have read through some documentation and some blog posts to understand the exception completely and it seems really that the worker threads were too high for the minimum threads which were set by default and creating new threads were too slow. At least that's how I understand the issue. Running with .NET 6 I think that using the ThreadPool is not an issue. So, just for reference, here is the commit which solved our issue: isenmann/Redis.Issue.Testsuite@44f468c But what I have seen, it seems that timeouts can have lots of causes and it's not always solved by increasing the threads. |
Glad you got a resolution here - thread pool is a constant source of confusion and I'm thinking of adding a bit to our logging there to help users debug even more. Closing this one out to tidy up! |
Hi,
we discovered a strange behaviour if we run our code with StackExchange.Redis client in a linux docker container. It seems that inside docker the StackExchange.Redis client handles connections or stuff differently than on my own machine (Windows) locally.
Thre redis sever is in both cases located locally on my machine.
Let me try to explain the scenario:
Those calls are done in 3 threads in parallel.
Locally on Windows (not in Docker) these calls run in a while loop took about 300ms each, it's a little bit slow, but it's working.
BUT running the code in a linux docker container, after two to three loops, there are lots of timeouts in nearly every call to Redis. The timeout is set to the default value (5s).
Because of that issue we improved the requests and split the set to buckets in 1000 elements per set, which improved the access time dramatically on the local machine. But even in docker this approach is much slower than on my local machine under Windows and after running several minutes the requests get slower and slower.
Below is a link to a repository which demonstrate this issue in a small program. It can be run with the bucket optimization and without it. By default it uses not the improved bucket version. You can enable it by setting useBuckets to true in Program.cs:line 3
Redis connection string must be set in the appsettings.json configuration file.
Our main issue is now why there appears timeouts in the non-optimized version only if it is run in docker and not locally? And why it happens even in the optimized version after several minutes? Even tried this in AWS and docker and there the timeouts also appears.
Demo project/repository to show the issue
Any help/hint is appreciated.
The text was updated successfully, but these errors were encountered: