Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisTimeoutException exception message question #973

Closed
CDargis opened this issue Oct 12, 2018 · 42 comments
Closed

RedisTimeoutException exception message question #973

CDargis opened this issue Oct 12, 2018 · 42 comments

Comments

@CDargis
Copy link

CDargis commented Oct 12, 2018

Hi. Using version 2.0. 495. I've seen this page and have found some great information. I have a question about some of the information in the timeout exception message. As an example:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536, serverEndpoint: ***, mgr: 10 of 10 available, clientName: ***, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=68,Free=32699,Min=2,Max=32767)

I understand that Busy > Min = waiting 500ms for a new thread to get added to the pool. What I am wondering is why mgr: 10 of 10 available - shouldn't the mgr count read 0 of 10 available since we are in the global thread pool? In pretty much all of our timeout errors there are at least 9 of 10 available (and we have A LOT of timeouts). I've seen this with both worker and IOCP threads being > than the min.

@mgravell
Copy link
Collaborator

mgravell commented Oct 12, 2018 via email

@CDargis
Copy link
Author

CDargis commented Oct 12, 2018

Thanks for the prompt response.

For context - we recently upgraded from 1.2 and have seen 10x more timeouts. We use redis to cache our NoSQL objects (which we can pull from Dynamo under the same load with no issues). The objects cached are around 200-300k. Some can be a little bigger. We now realize the recommendation is to have objects at 100k and we'll be making those changes. We use hashed objects for most of the objects that are larger and shoot for bucket sizes of 1000. I can say we experience these timeouts when bursts of parallel async work comes in (Task.WhenAll()). Previously we did not need to set min thread numbers but now we're setting them as high as 1000 to resolve these timeouts. With the min thread count this high we can get through the timeouts but everything is much more performant going straight to Dynamo ( we don't need to touch the min thread count for Dynamo)

EDIT: It is possible there are other contributing factors to our recent surge in timeouts such as change and size of our data.

Is there any more context I can give you?

Some follow-up questions:

  1. does the above give you any more insight?
  2. would you mind pointing me at the relevant code for why the custom pool would report idle?
  3. the in number seems like a default buffer size which tells me 0 bytes read maybe? What are reader threads?

@CDargis
Copy link
Author

CDargis commented Oct 19, 2018

Another error that may help provide insight here:

StackExchange.Redis.RedisConnectionException: SocketFailure on master.pp-app-cache.6koqgr.use1.cache.amazonaws.com:6379/Interactive, Idle, last: HGET, origin: ReadFromPipe, outstanding: 144, last-read: 930s ago, last-write: 29s ago, unanswered-write: 930s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 0 of 10 available, in: 0, last-heartbeat: 20s ago, last-mbeat: 4s ago, global: 0s ago, v: 2.0.513.63329 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.BeginReceive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
at System.Net.Sockets.NetworkStream.BeginRead(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)

This is the first I've seen the 0 of 10 available in our errors, but I think the most interesting here is:

last-read: 930s ago, last-write: 29s ago,

930 seconds is 15 minutes. No idea why the host would be shutting down the connection.

@danmml
Copy link

danmml commented Oct 24, 2018

I'm having similar timeout issues with the > 2.0 libraries compared to 1.2 version. The issues seems to come from what looks like higher Server load/server side retries (it starts to fall apart for me around 2k connections, and 20% Server load from an Azure Redis C6 instance). The general errors I see are similar to the below example
Timeout awaiting response (15750ms elapsed, timeout is 15000ms), inst: 0, qs: 40162, in: 0, serverEndpoint: Unspecified/...:6380, mgr: 7 of 10 available, clientName: ..., IOCP: (Busy=1,Free=999,Min=512,Max=1000), WORKER: (Busy=52,Free=32715,Min=1024,Max=32767),

Based on this: qs numbers seem really high-- my understanding based on the documentation, is that these are outstanding requests that have not been processed. I'm trying to figure out how these numbers are so high, as based on my code, I don't think it allows creating more than 5k requests. Is there a way to clear out outstanding requests, or flush a connection under timeouts? I've looked through the source code, but it's not clear to me how the library clears out requests that have been "abandoned" by the client.

@mgravell : Is there a recommended setting for high loads for the 2.0 libraries? I'm experimenting with some different settings now, although I think the general consensus I'm coming to is that the retry/settings that worked for 1.2, don't seem to behave well for us with 2.0 libraries.

@vanderlei-dev
Copy link

I'm having the same error in a website only when a have peak access (more requests than my average).
Both IIS and Redis Server are large enough and the resources (network traffic, CPU and memory) are just fine.

I already increased the thread quantity in machine.config and even implemented a "connection pool" as described in https://stackexchange.github.io/StackExchange.Redis/Timeouts) but no luck with that.

Here its a sample of the errors:

Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 65536,mgr: 0 of 10 available,  IOCP: (Busy=1,Free=2399,Min=1200,Max=2400), WORKER: (Busy=19,Free=2381,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 8080, mgr: 9 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=21,Free=2379,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 6 of 10 available,  IOCP: (Busy=4,Free=2396,Min=1200,Max=2400), WORKER: (Busy=34,Free=2366,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing HMSET (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=7,Free=2393,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 9 of 10 available,  IOCP: (Busy=0,Free=3200,Min=1600,Max=3200), WORKER: (Busy=11,Free=3189,Min=1600,Max=3200), v: 2.0.513.63329 
Timeout performing GET 	 (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=11,Free=2389,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing HMSET (5000ms), inst: 0, qs: 2, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=12,Free=2388,Min=1200,Max=2400), v: 2.0.513.63329

I noticed there the "in" amount in one of the log messages (in: 65536) its exactly the same as @CDargis (!?). Thats mean something? Any ideias @mgravell ?

@CDargis
Copy link
Author

CDargis commented Oct 30, 2018

@Vanderleipires

I've also implemented connection pooling that doesn't seem to help much. We have a typical cache-aside pattern with redis/dynamo.

What I noticed is that we experience the timeouts when large amounts of async work comes in (await Task.WhenAll()) and we were passing lots of tasks here sometimes. And these tasks spawn lots of other tasks that hit the cache and DB. Instead of passing all of the tasks at once I implemented parallel async extension to take in account the degree of parallelism. So instead of launching all of the tasks all at once with Task.WhenAll() we set the degree of parallelism to Environment.ProcessorCount. This definitely goes slower but it alleviates a lot of the timeouts. We've also stopped caching in a lot of spots. For the time being this is the solution we've gone with.

@hpk
Copy link
Contributor

hpk commented Nov 1, 2018

We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).

I've noticed the errors come in three flavors. Here are some representative samples:

  1. RedisTimeoutException:
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (5484ms elapsed, 
timeout is 5000ms), inst: 0, qs: 138, in: 0, serverEndpoint: 10.72.144.135:6379, mgr: 10 of 10
available, clientName: B4D107E329BC, PerfCounterHelperkeyHashSlot: 3602, IOCP:
(Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=13,Free=32754,Min=150,Max=32767),
v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can
cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
  1. RedisConnectionException: No connection is available to service this operation:
StackExchange.Redis.RedisConnectionException: No connection is available to service this
operation: HGETALL SR:reporter:500:d747d528-2b6b-465f-aa94-b9af4c1b705e; IOCP: 
(Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=11,Free=32756,Min=200,
Max=32767), Local-CPU: n/a
  1. RedisConnectionException: SocketFailure:
StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/TimedOut, 
last-recv: 5) on 10.72.144.135:6379/Interactive, Idle, last: INCR, origin: ReadFromPipe, 
outstanding: 323, last-read: 134942s ago, last-write: 0s ago, unanswered-write: 134942s 
ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 
0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.513.63329 --->
System.Net.Sockets.SocketException: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established connection failed 
because connected host has failed to respond

We have hundreds of containers in production all talking to the same ElastiCache cluster, but these connection errors only occur on one host at a time. I.e., it's not a general Redis outage, it's something localized to a specific container.

@hpk
Copy link
Contributor

hpk commented Nov 28, 2018

Any update on this? There seems to be a pretty noticeable increase in connection issues reported for v2.x.

To expand on my previous comment, the primary issue isn't that we see a handful of connection errors/timeouts here and there. It's that we'll often see a single container suddenly just start spewing hundreds of these connection errors. SE.Redis fails to recover and the only resolution is to kill the container.

@Bogatinovski
Copy link

We have the same scenario as @hpk. Is there any way to fix this ?

@ispysoftware
Copy link

got the same issue here - cache was working fine with 1.2.6, continually falling over with 2+

@unkusr007
Copy link

unkusr007 commented Apr 30, 2019

Hello, same kind of issue in 2.0, but also I have this issue only since I'm using elasticache with redis 4.0.
Timeouts are happening if I delete a large number of keys, and I receive a lot a key space notifications.
But in debug on my computer I have no issue using this same database, only in prod (container in ECS).

Timeout performing SREM (30000ms), next: EXISTS cache:thing:stuff:1437bc51-401d-4f35-b0a7-107bd50fdb60:en-US, inst: 1, qu: 0, qs: 91, aw: False, rs: ReadAsync, ws: Idle, in: 848, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/**************:6379, mgr: 10 of 10 available, clientName: 3e5d13637a65, IOCP: (Busy=0,Free=1000,Min=36,Max=1000), WORKER: (Busy=90,Free=32677,Min=36,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I tried to add way more vCPU, but the issue is the same.

@satuday
Copy link

satuday commented Aug 29, 2019

Hi, I'm experiencing the same issue in a .net core 2.1 project. Any idea on this issue? I'm not see a lot, average around 5 to 10 per day.

Remove cache failed Timeout performing UNLINK organisation-******** (5000ms), inst: 1, qs: 8, in: 352, serverEndpoint: Unspecified/******:6379, mgr: 10 of 10 available, clientName: 580a039d22ca, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=7,Free=32760,Min=2,Max=32767), v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) : at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\ConnectionMultiplexer.cs:line 2197 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisBase.cs:line 54 at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisDatabase.cs:line 590

@Alik2015
Copy link

Alik2015 commented Sep 5, 2019

We are getting
Connection id "0HLPHO068DD3J", Request id "0HLPHO068DD3J:00000001": An unhandled exception was thrown by the application.
StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL, inst: 7, queue: 9, qu: 0, qs: 9, qc: 0, wr: 0, wq: 0, in: 58988, ar: 0, clientName: , serverEndpoint: Unspecified/*:6379, keyHashSlot: 14678 .

AWS Redis 5
Latest Package for StachExchange from redis.

@RajaInzamam
Copy link

Hello All,
StackExchange.Redis.StrongName version : 1.2.1
Microsoft.Web.RedisSessionStateProvider version : 2.2.6.344

All of a sudden our production server (Azure) throwing exception messages:

Exception Category 1 "Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 17, qu: 0, qs: 17, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName......"

image
image

Exception Category 2 "No connection is available to service this operation: EVAL; A connection attempt failed because the connected party did not properly respond after a period of time..."

image

Is this something related to Threadpool Throttling ??

@limoniapps
Copy link

Same problem here:

Exception occurred: Timeout performing HGET (5000ms), next: HGET ....com, inst: 3, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/....redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD..., IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=8183,Min=1,Max=8191), v: 2.0.601.3402

@hiral-cloud-evangelist
Copy link

Getting random errors -
{"code":500,"message":"Timeout awaiting response (6625ms elapsed, timeout is 5000ms), inst: 0, qs: 5, in: 0, serverEndpoint: Unspecified/ .. :6380, mgr: 8 of 10 available, clientName: RD0003FF00D3EC, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=4,Free=678,Min=2,Max=682), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)","status":500}

@mgravell
Copy link
Collaborator

mgravell commented Jan 28, 2020

Hi all; I'm sorry for delays getting back onto here - lots of plates spinning, and I simply haven't been available. I'm ramping up some investigations into timeouts, so: if this is still impacting you, I'm genuinely here again!

Some things that I'm keenly interested in for all of these things, because they radically change the moving pieces:

  • what framework exactly? .NET Framework 4.7.2? .NET Core App 3.0.1? etc
  • is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")
  • what is the main "rs" value in the error log?
  • was the server actually jammed at the time?

(For example on the "rs" one; yesterday I was talking to someone who was seeing rs:CompletePendingMessage, which sounds like there may be a TPL issue re thread-stealing; I'm investigating that possibility currently)

@blinker001
Copy link

blinker001 commented Jan 28, 2020

hi @mgravell , nice to see that you're back.

Exception thrown: 'StackExchange.Redis.RedisTimeoutException' in System.Private.CoreLib.dll: 'Timeout awaiting response (5615ms elapsed, timeout is 5000ms), inst: 0, qs: 90901, in: 0, serverEndpoint: Unspecified/localhost:6379, mgr: 9 of 10 available, clientName: Georgis-MBP, IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)'

I'm on MacOS, .Net Core 3.1.101
This error happens on my local machine.

@dplarina
Copy link

dplarina commented Jan 28, 2020

Timeout performing SET (3000ms), next: PTTL+GET us:staging:campaignInfoCache, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 8250, serverEndpoint: Unspecified/ihubapp.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: us-ihubapp-codeblack-west, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=2,Free=32765,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

For me the rs value seems to always be ReadAsync.
On windows server, .net 4.7.2 on Azure Redis, using TLS
Redis load is very low

@jasoncono
Copy link

OS: Windows 2012 R2
Framework: .NET 4.7.2
Endpoint: AWS ElastiCache Redis without TLS

rs seems to be ReadAsync most of the time
Several servers are hanging during this period, and then they all free up.

Here are a few of the different logged errors:

RedisTimeoutException: Timeout performing SETEX (1000ms), inst: 7, qu: 0, qs: 78, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 299, IOCP: (Busy=1,Free=999,Min=16,Max=1000), WORKER: (Busy=100,Free=32667,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout performing GET (1000ms), inst: 2, qu: 0, qs: 32, aw: False, rs: ReadAsync, ws: Idle, in: 2328, in-pipe: 100, out-pipe: 10781, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16330, IOCP: (Busy=18,Free=982,Min=16,Max=1000), WORKER: (Busy=183,Free=32584,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1031ms elapsed, timeout is 1000ms), command=SET, inst: 0, qu: 0, qs: 44, aw: True, rs: ReadAsync, ws: Writing, in: 0, in-pipe: 0, out-pipe: 31046, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16138, IOCP: (Busy=20,Free=980,Min=16,Max=1000), WORKER: (Busy=209,Free=32558,Min=100,Max=32767), v: 2.0.600.65315

@mgravell
Copy link
Collaborator

@jasoncono the first one looks like a genuine server (or network) stall (nothing inbound or outbound); the other two... could be just about anything, annoyingly; yet more data points, at least...

@hpk
Copy link
Contributor

hpk commented Jan 29, 2020

To follow up on my earlier comment, below are some more error examples. We're connecting to a Redis cluster that never sees higher than 15% CPU usage. No TLS is involved, and there's nothing in the Redis slowlog to indicate that the server is getting overwhelmed.

We have a couple environments, and see these errors in both:

OS: Windows Server 2019 in a Docker image on AWS ECS
Framework: .NET Framework 4.7.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (765ms elapsed, timeout is 500ms), inst: 0, qs: 64, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 6 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 10610, IOCP: (Busy=1,Free=999,Min=150,Max=1000), WORKER: (Busy=5,Free=32762,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (969ms elapsed, timeout is 500ms), inst: 0, qs: 66, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 7195, IOCP: (Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=9,Free=32758,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

OS: Windows & Linux, running in Docker images on AWS EKS
Framework: .NET Core 2.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (952ms elapsed, timeout is 50ms), inst: 0, qs: 1, in: 6, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: <--redacted-->, PerfCounterHelperkeyHashSlot: 943, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=26,Free=32741,Min=2,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The nodes indicated by serverEndpoint vary, so it's not just a single cluster node.

Happy to provide any more information you need; just let me know!

@CDargis
Copy link
Author

CDargis commented Jan 31, 2020

Endpoint: AWS ElastiCache Redis

  • what framework exactly? .NET Framework 4.7.2? .NET Core App 3.0.1? etc

.NET framework 4.7.2

  • is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")

Yes. AWS configuration of Redis says "Encryption in-transit: Yes"

  • what is the main "rs" value in the error log?

I do not see this in the logs

  • was the server actually jammed at the time?

The metrics on the server seemed normal.

@NickCraver
Copy link
Collaborator

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting

<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />

or, something less than 4.5 in <httpRuntime> (also web.config):

<httpRuntime targetFramework="4.x" />

.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:

<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>

To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be hugely helpful to know. @mgravell is working on a workaround for this scenario.

@afinzel
Copy link
Contributor

afinzel commented Mar 4, 2020

Hey @NickCraver, @mgravell

We have

aspnet:UseTaskFriendlySynchronizationContext set to true

I can't see that we have httpruntime set.

We were having issues with version 1.2.6 on production. Occasionally the queue on one of our servers would stop processing and the queue number kept incremeanting. We put a fudge in, so that if we get x amount of these consistently we recreate the multiplexer. Looking around your project we saw you had rewritten this, so did an upgrade to the latest version. That upgrade is currently failing in QA as we get similar issues under load.

"StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), inst: 0, qu: 1187, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/removed, mgr: 10 of 10 available, clientName: Removed IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=5,Free=32762,Min=600,Max=32767), v: 2.0.601.3402"

We too are using AWS Elasticache. I am not sure how much time you have dedicated to this project, but I could probably recreate this with you in our staging environment, if that would help.

@hshazli
Copy link

hshazli commented Mar 12, 2020

I have similar issue it work fine for 4 months and suddenly i get this error

Timeout performing EVAL (5000ms), next: EVAL, inst: 129, qu: 0, qs: 125, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/----.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF551B49, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=210,Free=32557,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95 at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 122 at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.RedisConnectionWrapper.TryUpdateAndReleaseLock(Object lockId, ISessionStateItemCollection data, Int32 sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 363 at Microsoft.Web.Redis.RedisSessionStateProvider.d__25.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 433 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.SessionStateModuleAsync.d__80.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar) at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.<>c__DisplayClass285_0.b__0() at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step) at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously

@afinzel
Copy link
Contributor

afinzel commented Mar 12, 2020

@hshazli your workers just hit 210 and your min was 200. Do you think increasing your min workers Would help? I’m not an expert in this but you could try it.

@hshazli
Copy link

hshazli commented Mar 12, 2020

@afinzel but this one error other message in other error message i get the worker hit 300 or above what is the min value i should use and how to select the proper one,
also what about this line serverEndpoint: Unspecified/----.redis.cache.windows.net:6380 is that normal?

@NickCraver
Copy link
Collaborator

@hshazli Yep, that's normal. In your case, it seems to be a very overloaded thread pool and it wasn't able to pull bytes off the queue fast enough during a spike. It's possible network blips resulting in spikes of incoming traffic contributed to that.

@afinzel I believe your issue is different (#1120), but there's a fix in queue in #1374 there. Does yours happen after a network blip of some sort (even AWS failing over)? That's the situation there, the connection's hung in a bad state afterwards.

@afinzel
Copy link
Contributor

afinzel commented Mar 25, 2020

@NickCraver, I didn't think so but I see that changed is merged, so I will get the latest version and retest. Thanks.

edit: It looks like it isn't on Nuget yet.

@anilguzel
Copy link

anilguzel commented Apr 27, 2020

hi, i using redis over a month efficiently but i just got this error yesterday.
there is not seemable crash or failover, but its dark to getting meaningful.

logged:

Timeout performing ZSCAN (5000ms),
next: ZSCAN xxx, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 18, in-pipe: 0, out-pipe: 0,
serverEndpoint: xx.euc1.cache.amazonaws.com:6379,
mgr: 10 of 10 available,
IOCP: (Busy=0,Free=1000,Min=2,Max=1000),
WORKER: (Busy=37,Free=32730,Min=2,Max=32767), v: 2.1.0.1

infos:

Endpoint: AWS ElastiCache Redis

.NET Core App 2.2.107

AWS configuration of Redis says "Encryption in-transit: No"

The metrics on the server seemed normal.

i hope that would helpful, thanks

@michaelrod64
Copy link

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting

<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />

or, something less than 4.5 in <httpRuntime> (also web.config):

<httpRuntime targetFramework="4.x" />

.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:

<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>

To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be hugely helpful to know. @mgravell is working on a workaround for this scenario.

@NickCraver, Just wanted to clarify here:

Is the following fine in my App.config?

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2" />

According to this Microsoft documentation, any .Net Framework version from 4.0-4.8 should use "v4.0"

https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/startup/supportedruntime-element

Your comment, which had
<supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>

seemed to imply that we could have a value greater than v4.0 for "version"

@vhmoura
Copy link

vhmoura commented Jul 1, 2020

Hi there,

Also getting similar error. This is running on a azure function. Built using .net core 2.2 using version 2.1.39

System.Exception: Error fetching customer tags for devices 'xxxx'. ---> StackExchange.Redis.RedisTimeoutException: Timeout performing EXISTS (5000ms), next: EXISTS xxxxxxx, inst: 1, qu: 0, qs: 16, aw: False, rs: ReadAsync, ws: Idle, in: 850, serverEndpoint: tttttt:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxx, IOCP: (Busy=16,Free=984,Min=6,Max=1000), WORKER: (Busy=0,Free=32767,Min=6,Max=32767), v: 2.1.39.39788 (Please take a look at this article for some common client-side issues that can cause timeouts:

https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2605 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.KeyExists(RedisKey key, CommandFlags flags) in /
/src/StackExchange.Redis/RedisDatabase.cs:line 668

Any help appreciated.

Cheers,
Vini

@SKamalou
Copy link

I get below errors where try to to get or set a big data (about 70 MegaByte). any idea? can i increase redis timeout?

I use redis on windows.
StackExchange.Redis version: 2.1.58

what framework exactly? .NETFramework 4.6.1
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1" />

is there TLS involved? no
what is the main "rs" value in the error log? TryParseResult
was the server actually jammed at the time? no

Timeout performing GET (5000ms), next: GET 8a5d01ea-6017-4110-a1f7-42f63e309ad3, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 81920, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 0 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 9b181d9d-98dc-488a-998c-aded191c2e5d, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 61027, in-pipe: 83549, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 3 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 2db678e5-4a12-44be-98fd-173cd64fa0cd, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 147456, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 2 of 10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=2,Free=998,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

@mgot90
Copy link

mgot90 commented Oct 14, 2020

This issue lives on - 2 years later. @CDargis was definitely onto something with: "We only experience this when running many task / with (Task.WhenAll())" . In my case we see this when doing 1,000s of GETs while using Parallel.ForEach. Setting MaxDegreeOfParallelism = Environment.ProcessorCount was a big help - but does not fix everything. Has anyone else found a better solution?

Active issue for 2 years now.

@mgot90
Copy link

mgot90 commented Oct 14, 2020

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

@jeffras
Copy link

jeffras commented Nov 3, 2020

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

@mgot90
We are currently struggling with this. We have a process using the cache and redis transactions. When scaling the process to handle many concurrent requests we hit the timeouts almost immediately. Can you provide more info on the fix that you discovered? The only fix that we have found is to limit concurrent requests to redis, this defeats the purpose of the multiplexing and we take a huge hit with performance.

@arefinsami
Copy link

arefinsami commented Dec 16, 2020

Having the same exception without any obvious reason.

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5641ms elapsed, timeout is 5000ms), command=GET, next: GET <MyKey>, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/<myredis>.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0009DFC8C24D, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=3,Free=8188,Min=50,Max=8191), Local-CPU: unavailable, v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I set both IncludePerformanceCountersInExceptions and IncludeDetailInExceptions to true, but still no CPU information.

@mgot90
Copy link

mgot90 commented Dec 16, 2020

@jeffras I'm just another dev trying to fix the same problem. Here's what we did:

In the end, we still have this happening every now and then. We're using Azure for our Redis provider. I haven't ruled out that our pricing tier is also related: https://azure.microsoft.com/en-us/pricing/details/cache/

Not sure if it's useful to you, but here is the Azure best practices as well: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices

Did you guys have luck or try anything different?

@jeffras
Copy link

jeffras commented Dec 17, 2020

@mgot90 Thanks for the info, did you modify the stack exchange code for IDatabaseAsync? I did see it in the source but didn't want to maintain a branch to accomplish this. We still have the problem. We ended up having to wrap our code in a lock and it went away, but our throughput clearly suffers.

@PrzemekGawron
Copy link

We've been dealing with these kind of errors since months (if not even years). Three months ago we've decided to reimplement our caching layer with following design assumptions:

  • we do not store in redis anything that is larger than 100KB
  • objects larger than 100KB can either be stored in BlobStorage or logic falls back to primary source. (It is configurable and depends on usecase)

I'm aware that this approach might not be ideal for everyone and has some performance footprint (as you need to check the size) but at least may shed some more light.

Our setup:

  • Azure Cache for Redis (we've tried multiple pricing tiers but without significant effect)
  • Azure App Services (net46, netcore11->netcore31) migrated later to AKS (netcore31)
    But neither combination worked until we've introduced the changes mentioned at the beginning which I think is kind of recommendation you can infer from docs and also a suggestion we've got from Azure Support at the end of this long journey.

@NickCraver
Copy link
Collaborator

Going to close this out to cleanup - we had lots of various cases leading to timeouts in a myriad of environments and combinations. Many of these reports led to tweaks, some improved handling, and much better error messaged giving us more information. I want to close this old issue out so that anyone hitting something today in a much newer version and error message can more easily get help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests