RedisTimeoutException exception message question #973

CDargis · 2018-10-12T21:53:27Z

Hi. Using version 2.0. 495. I've seen this page and have found some great information. I have a question about some of the information in the timeout exception message. As an example:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536, serverEndpoint: ***, mgr: 10 of 10 available, clientName: ***, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=68,Free=32699,Min=2,Max=32767)

I understand that Busy > Min = waiting 500ms for a new thread to get added to the pool. What I am wondering is why mgr: 10 of 10 available - shouldn't the mgr count read 0 of 10 available since we are in the global thread pool? In pretty much all of our timeout errors there are at least 9 of 10 available (and we have A LOT of timeouts). I've seen this with both worker and IOCP threads being > than the min.

The text was updated successfully, but these errors were encountered:

mgravell · 2018-10-12T22:40:46Z

Without more context it is hard to know what is happening. On the surface, the custom pool seems to think it is idle. Whether that is right or not is complex - I should ensure that is visible in the log. The high "in" is concerning, though. I hope we haven't found a new way of stealing reader threads!

…

On Fri, 12 Oct 2018, 22:53 Chris Dargis, ***@***.***> wrote: Hi. Using running version 2.0. 495. I've seen this <https://github.com/StackExchange/StackExchange.Redis/blob/master/docs/Timeouts.md> page and have found some great information. I have a question about some of the information in the timeout exception message. As an example: StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (15281ms elapsed, timeout is 15000ms), inst: 0, qs: 414, in: 65536, serverEndpoint: ***, mgr: 10 of 10 available, clientName: ***, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=68,Free=32699,Min=2,Max=32767) I understand that Busy > Min = waiting 500ms for a new thread to get added to the pool. What I am wondering is why mgr: 10 of 10 available - shouldn't the mgr count read 0 of 10 available since we are in the global thread pool? In pretty much all of our timeout errors there are at least 9 of 10 available. I've seen this with both worker and IOCP threads being > than the min. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#973>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABDsMgq5ldZfvP8ntaCn5sAmtcg7g__ks5ukQ9YgaJpZM4XaPuR> .

CDargis · 2018-10-12T23:13:08Z

Thanks for the prompt response.

For context - we recently upgraded from 1.2 and have seen 10x more timeouts. We use redis to cache our NoSQL objects (which we can pull from Dynamo under the same load with no issues). The objects cached are around 200-300k. Some can be a little bigger. We now realize the recommendation is to have objects at 100k and we'll be making those changes. We use hashed objects for most of the objects that are larger and shoot for bucket sizes of 1000. I can say we experience these timeouts when bursts of parallel async work comes in (Task.WhenAll()). Previously we did not need to set min thread numbers but now we're setting them as high as 1000 to resolve these timeouts. With the min thread count this high we can get through the timeouts but everything is much more performant going straight to Dynamo ( we don't need to touch the min thread count for Dynamo)

EDIT: It is possible there are other contributing factors to our recent surge in timeouts such as change and size of our data.

Is there any more context I can give you?

Some follow-up questions:

does the above give you any more insight?
would you mind pointing me at the relevant code for why the custom pool would report idle?
the in number seems like a default buffer size which tells me 0 bytes read maybe? What are reader threads?

CDargis · 2018-10-19T21:49:00Z

Another error that may help provide insight here:

StackExchange.Redis.RedisConnectionException: SocketFailure on master.pp-app-cache.6koqgr.use1.cache.amazonaws.com:6379/Interactive, Idle, last: HGET, origin: ReadFromPipe, outstanding: 144, last-read: 930s ago, last-write: 29s ago, unanswered-write: 930s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 0 of 10 available, in: 0, last-heartbeat: 20s ago, last-mbeat: 4s ago, global: 0s ago, v: 2.0.513.63329 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.BeginReceive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, AsyncCallback callback, Object state)
at System.Net.Sockets.NetworkStream.BeginRead(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)

This is the first I've seen the 0 of 10 available in our errors, but I think the most interesting here is:

last-read: 930s ago, last-write: 29s ago,

930 seconds is 15 minutes. No idea why the host would be shutting down the connection.

danmml · 2018-10-24T01:01:58Z

I'm having similar timeout issues with the > 2.0 libraries compared to 1.2 version. The issues seems to come from what looks like higher Server load/server side retries (it starts to fall apart for me around 2k connections, and 20% Server load from an Azure Redis C6 instance). The general errors I see are similar to the below example
Timeout awaiting response (15750ms elapsed, timeout is 15000ms), inst: 0, qs: 40162, in: 0, serverEndpoint: Unspecified/...:6380, mgr: 7 of 10 available, clientName: ..., IOCP: (Busy=1,Free=999,Min=512,Max=1000), WORKER: (Busy=52,Free=32715,Min=1024,Max=32767),

Based on this: qs numbers seem really high-- my understanding based on the documentation, is that these are outstanding requests that have not been processed. I'm trying to figure out how these numbers are so high, as based on my code, I don't think it allows creating more than 5k requests. Is there a way to clear out outstanding requests, or flush a connection under timeouts? I've looked through the source code, but it's not clear to me how the library clears out requests that have been "abandoned" by the client.

@mgravell : Is there a recommended setting for high loads for the 2.0 libraries? I'm experimenting with some different settings now, although I think the general consensus I'm coming to is that the retry/settings that worked for 1.2, don't seem to behave well for us with 2.0 libraries.

vanderlei-dev · 2018-10-29T20:10:45Z

I'm having the same error in a website only when a have peak access (more requests than my average).
Both IIS and Redis Server are large enough and the resources (network traffic, CPU and memory) are just fine.

I already increased the thread quantity in machine.config and even implemented a "connection pool" as described in https://stackexchange.github.io/StackExchange.Redis/Timeouts) but no luck with that.

Here its a sample of the errors:

Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 65536,mgr: 0 of 10 available,  IOCP: (Busy=1,Free=2399,Min=1200,Max=2400), WORKER: (Busy=19,Free=2381,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=17,Free=2383,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 8080, mgr: 9 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=21,Free=2379,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 6 of 10 available,  IOCP: (Busy=4,Free=2396,Min=1200,Max=2400), WORKER: (Busy=34,Free=2366,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing HMSET (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=7,Free=2393,Min=1200,Max=2400), v: 2.0.513.63329 
Timeout performing GET   (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 9 of 10 available,  IOCP: (Busy=0,Free=3200,Min=1600,Max=3200), WORKER: (Busy=11,Free=3189,Min=1600,Max=3200), v: 2.0.513.63329 
Timeout performing GET 	 (5000ms), inst: 0, qs: 1, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=11,Free=2389,Min=1200,Max=2400), v: 2.0.513.63329
Timeout performing HMSET (5000ms), inst: 0, qs: 2, in: 0, 	 mgr: 10 of 10 available,  IOCP: (Busy=0,Free=2400,Min=1200,Max=2400), WORKER: (Busy=12,Free=2388,Min=1200,Max=2400), v: 2.0.513.63329

I noticed there the "in" amount in one of the log messages (in: 65536) its exactly the same as @CDargis (!?). Thats mean something? Any ideias @mgravell ?

CDargis · 2018-10-30T17:58:23Z

@Vanderleipires

I've also implemented connection pooling that doesn't seem to help much. We have a typical cache-aside pattern with redis/dynamo.

What I noticed is that we experience the timeouts when large amounts of async work comes in (await Task.WhenAll()) and we were passing lots of tasks here sometimes. And these tasks spawn lots of other tasks that hit the cache and DB. Instead of passing all of the tasks at once I implemented parallel async extension to take in account the degree of parallelism. So instead of launching all of the tasks all at once with Task.WhenAll() we set the degree of parallelism to Environment.ProcessorCount. This definitely goes slower but it alleviates a lot of the timeouts. We've also stopped caching in a lot of spots. For the time being this is the solution we've gone with.

hpk · 2018-11-01T16:13:08Z

We are experiencing similar issues in v2. Our Redis is a pretty over-provisioned ElastiCache cluster. Workload is mostly cache-aside with small items (only a few KB at most).

I've noticed the errors come in three flavors. Here are some representative samples:

RedisTimeoutException:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (5484ms elapsed, 
timeout is 5000ms), inst: 0, qs: 138, in: 0, serverEndpoint: 10.72.144.135:6379, mgr: 10 of 10
available, clientName: B4D107E329BC, PerfCounterHelperkeyHashSlot: 3602, IOCP:
(Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=13,Free=32754,Min=150,Max=32767),
v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can
cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

RedisConnectionException: No connection is available to service this operation:

StackExchange.Redis.RedisConnectionException: No connection is available to service this
operation: HGETALL SR:reporter:500:d747d528-2b6b-465f-aa94-b9af4c1b705e; IOCP: 
(Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=11,Free=32756,Min=200,
Max=32767), Local-CPU: n/a

RedisConnectionException: SocketFailure:

StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/TimedOut, 
last-recv: 5) on 10.72.144.135:6379/Interactive, Idle, last: INCR, origin: ReadFromPipe, 
outstanding: 323, last-read: 134942s ago, last-write: 0s ago, unanswered-write: 134942s 
ago, keep-alive: 10s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 
0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.513.63329 --->
System.Net.Sockets.SocketException: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established connection failed 
because connected host has failed to respond

We have hundreds of containers in production all talking to the same ElastiCache cluster, but these connection errors only occur on one host at a time. I.e., it's not a general Redis outage, it's something localized to a specific container.

hpk · 2018-11-28T21:56:49Z

Any update on this? There seems to be a pretty noticeable increase in connection issues reported for v2.x.

To expand on my previous comment, the primary issue isn't that we see a handful of connection errors/timeouts here and there. It's that we'll often see a single container suddenly just start spewing hundreds of these connection errors. SE.Redis fails to recover and the only resolution is to kill the container.

Bogatinovski · 2019-01-07T16:51:42Z

We have the same scenario as @hpk. Is there any way to fix this ?

ispysoftware · 2019-02-01T07:05:44Z

got the same issue here - cache was working fine with 1.2.6, continually falling over with 2+

unkusr007 · 2019-04-30T20:47:49Z

Hello, same kind of issue in 2.0, but also I have this issue only since I'm using elasticache with redis 4.0.
Timeouts are happening if I delete a large number of keys, and I receive a lot a key space notifications.
But in debug on my computer I have no issue using this same database, only in prod (container in ECS).

Timeout performing SREM (30000ms), next: EXISTS cache:thing:stuff:1437bc51-401d-4f35-b0a7-107bd50fdb60:en-US, inst: 1, qu: 0, qs: 91, aw: False, rs: ReadAsync, ws: Idle, in: 848, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/**************:6379, mgr: 10 of 10 available, clientName: 3e5d13637a65, IOCP: (Busy=0,Free=1000,Min=36,Max=1000), WORKER: (Busy=90,Free=32677,Min=36,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I tried to add way more vCPU, but the issue is the same.

satuday · 2019-08-29T15:35:45Z

Hi, I'm experiencing the same issue in a .net core 2.1 project. Any idea on this issue? I'm not see a lot, average around 5 to 10 per day.

Remove cache failed Timeout performing UNLINK organisation-******** (5000ms), inst: 1, qs: 8, in: 352, serverEndpoint: Unspecified/******:6379, mgr: 10 of 10 available, clientName: 580a039d22ca, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=7,Free=32760,Min=2,Max=32767), v: 2.0.513.63329 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) : at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\ConnectionMultiplexer.cs:line 2197 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisBase.cs:line 54 at StackExchange.Redis.RedisDatabase.KeyDelete(RedisKey key, CommandFlags flags) in C:\\projects\\stackexchange-redis\\src\\StackExchange.Redis\\RedisDatabase.cs:line 590

Alik2015 · 2019-09-05T06:04:15Z

We are getting
Connection id "0HLPHO068DD3J", Request id "0HLPHO068DD3J:00000001": An unhandled exception was thrown by the application.
StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL, inst: 7, queue: 9, qu: 0, qs: 9, qc: 0, wr: 0, wq: 0, in: 58988, ar: 0, clientName: , serverEndpoint: Unspecified/*:6379, keyHashSlot: 14678 .

AWS Redis 5
Latest Package for StachExchange from redis.

RajaInzamam · 2019-11-21T13:42:55Z

Hello All,
StackExchange.Redis.StrongName version : 1.2.1
Microsoft.Web.RedisSessionStateProvider version : 2.2.6.344

All of a sudden our production server (Azure) throwing exception messages:

Exception Category 1 "Timeout performing EVAL, inst: 1, mgr: Inactive, err: never, queue: 17, qu: 0, qs: 17, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName......"

image

Exception Category 2 "No connection is available to service this operation: EVAL; A connection attempt failed because the connected party did not properly respond after a period of time..."

Is this something related to Threadpool Throttling ??

limoniapps · 2019-12-20T11:23:57Z

Same problem here:

Exception occurred: Timeout performing HGET (5000ms), next: HGET ....com, inst: 3, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/....redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD..., IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=8183,Min=1,Max=8191), v: 2.0.601.3402

hiral-cloud-evangelist · 2020-01-28T06:34:11Z

Getting random errors -
{"code":500,"message":"Timeout awaiting response (6625ms elapsed, timeout is 5000ms), inst: 0, qs: 5, in: 0, serverEndpoint: Unspecified/ .. :6380, mgr: 8 of 10 available, clientName: RD0003FF00D3EC, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=4,Free=678,Min=2,Max=682), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)","status":500}

mgravell · 2020-01-28T07:21:17Z

Hi all; I'm sorry for delays getting back onto here - lots of plates spinning, and I simply haven't been available. I'm ramping up some investigations into timeouts, so: if this is still impacting you, I'm genuinely here again!

Some things that I'm keenly interested in for all of these things, because they radically change the moving pieces:

what framework exactly? .NET Framework 4.7.2? .NET Core App 3.0.1? etc
is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")
what is the main "rs" value in the error log?
was the server actually jammed at the time?

(For example on the "rs" one; yesterday I was talking to someone who was seeing rs:CompletePendingMessage, which sounds like there may be a TPL issue re thread-stealing; I'm investigating that possibility currently)

blinker001 · 2020-01-28T21:49:34Z

hi @mgravell , nice to see that you're back.

Exception thrown: 'StackExchange.Redis.RedisTimeoutException' in System.Private.CoreLib.dll: 'Timeout awaiting response (5615ms elapsed, timeout is 5000ms), inst: 0, qs: 90901, in: 0, serverEndpoint: Unspecified/localhost:6379, mgr: 9 of 10 available, clientName: Georgis-MBP, IOCP: (Busy=0,Free=1000,Min=16,Max=1000), WORKER: (Busy=1,Free=32766,Min=16,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)'

I'm on MacOS, .Net Core 3.1.101
This error happens on my local machine.

dplarina · 2020-01-28T23:29:04Z

Timeout performing SET (3000ms), next: PTTL+GET us:staging:campaignInfoCache, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 8250, serverEndpoint: Unspecified/ihubapp.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: us-ihubapp-codeblack-west, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=2,Free=32765,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

For me the rs value seems to always be ReadAsync.
On windows server, .net 4.7.2 on Azure Redis, using TLS
Redis load is very low

jasoncono · 2020-01-29T20:14:04Z

OS: Windows 2012 R2
Framework: .NET 4.7.2
Endpoint: AWS ElastiCache Redis without TLS

rs seems to be ReadAsync most of the time
Several servers are hanging during this period, and then they all free up.

Here are a few of the different logged errors:

RedisTimeoutException: Timeout performing SETEX (1000ms), inst: 7, qu: 0, qs: 78, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 299, IOCP: (Busy=1,Free=999,Min=16,Max=1000), WORKER: (Busy=100,Free=32667,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout performing GET (1000ms), inst: 2, qu: 0, qs: 32, aw: False, rs: ReadAsync, ws: Idle, in: 2328, in-pipe: 100, out-pipe: 10781, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16330, IOCP: (Busy=18,Free=982,Min=16,Max=1000), WORKER: (Busy=183,Free=32584,Min=100,Max=32767), v: 2.0.600.65315

RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1031ms elapsed, timeout is 1000ms), command=SET, inst: 0, qu: 0, qs: 44, aw: True, rs: ReadAsync, ws: Writing, in: 0, in-pipe: 0, out-pipe: 31046, serverEndpoint: x.x.x.x:6379, mgr: 10 of 10 available, clientName: 06F02D, PerfCounterHelperkeyHashSlot: 16138, IOCP: (Busy=20,Free=980,Min=16,Max=1000), WORKER: (Busy=209,Free=32558,Min=100,Max=32767), v: 2.0.600.65315

mgravell · 2020-01-29T20:30:37Z

@jasoncono the first one looks like a genuine server (or network) stall (nothing inbound or outbound); the other two... could be just about anything, annoyingly; yet more data points, at least...

hpk · 2020-01-29T20:58:53Z

To follow up on my earlier comment, below are some more error examples. We're connecting to a Redis cluster that never sees higher than 15% CPU usage. No TLS is involved, and there's nothing in the Redis slowlog to indicate that the server is getting overwhelmed.

We have a couple environments, and see these errors in both:

OS: Windows Server 2019 in a Docker image on AWS ECS
Framework: .NET Framework 4.7.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (765ms elapsed, timeout is 500ms), inst: 0, qs: 64, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 6 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 10610, IOCP: (Busy=1,Free=999,Min=150,Max=1000), WORKER: (Busy=5,Free=32762,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (969ms elapsed, timeout is 500ms), inst: 0, qs: 66, in: 131072, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: 756E16032272, PerfCounterHelperkeyHashSlot: 7195, IOCP: (Busy=0,Free=1000,Min=150,Max=1000), WORKER: (Busy=9,Free=32758,Min=150,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

OS: Windows & Linux, running in Docker images on AWS EKS
Framework: .NET Core 2.2

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (952ms elapsed, timeout is 50ms), inst: 0, qs: 1, in: 6, serverEndpoint: <x.x.x.x>:6379, mgr: 10 of 10 available, clientName: <--redacted-->, PerfCounterHelperkeyHashSlot: 943, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=26,Free=32741,Min=2,Max=32767), v: 2.0.519.65453 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The nodes indicated by serverEndpoint vary, so it's not just a single cluster node.

Happy to provide any more information you need; just let me know!

CDargis · 2020-01-31T17:18:33Z

Endpoint: AWS ElastiCache Redis

what framework exactly? .NET Framework 4.7.2? .NET Core App 3.0.1? etc

.NET framework 4.7.2

is there TLS involved? (this usually means "cloud provider", but it could also mean "custom ssl tunner" or "redis 6 preview")

Yes. AWS configuration of Redis says "Encryption in-transit: Yes"

what is the main "rs" value in the error log?

I do not see this in the logs

was the server actually jammed at the time?

The metrics on the server seemed normal.

NickCraver · 2020-02-01T16:02:01Z

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting

<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />

or, something less than 4.5 in <httpRuntime> (also web.config):

<httpRuntime targetFramework="4.x" />

.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:

<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>

To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be hugely helpful to know. @mgravell is working on a workaround for this scenario.

afinzel · 2020-03-04T09:53:39Z

Hey @NickCraver, @mgravell

We have

aspnet:UseTaskFriendlySynchronizationContext set to true

I can't see that we have httpruntime set.

We were having issues with version 1.2.6 on production. Occasionally the queue on one of our servers would stop processing and the queue number kept incremeanting. We put a fudge in, so that if we get x amount of these consistently we recreate the multiplexer. Looking around your project we saw you had rewritten this, so did an upgrade to the latest version. That upgrade is currently failing in QA as we get similar issues under load.

"StackExchange.Redis.RedisTimeoutException: Timeout performing GET (5000ms), inst: 0, qu: 1187, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: Unspecified/removed, mgr: 10 of 10 available, clientName: Removed IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=5,Free=32762,Min=600,Max=32767), v: 2.0.601.3402"

We too are using AWS Elasticache. I am not sure how much time you have dedicated to this project, but I could probably recreate this with you in our staging environment, if that would help.

hshazli · 2020-03-12T12:20:20Z

I have similar issue it work fine for 4 months and suddenly i get this error

Timeout performing EVAL (5000ms), next: EVAL, inst: 129, qu: 0, qs: 125, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/----.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF551B49, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=210,Free=32557,Min=200,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95 at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 122 at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68 at Microsoft.Web.Redis.RedisConnectionWrapper.TryUpdateAndReleaseLock(Object lockId, ISessionStateItemCollection data, Int32 sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 363 at Microsoft.Web.Redis.RedisSessionStateProvider.d__25.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 433 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.SessionStateModuleAsync.d__80.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar) at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.<>c__DisplayClass285_0.b__0() at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step) at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously

afinzel · 2020-03-12T13:50:29Z

@hshazli your workers just hit 210 and your min was 200. Do you think increasing your min workers Would help? I’m not an expert in this but you could try it.

hshazli · 2020-03-12T14:35:37Z

@afinzel but this one error other message in other error message i get the worker hit 300 or above what is the min value i should use and how to select the proper one,
also what about this line serverEndpoint: Unspecified/----.redis.cache.windows.net:6380 is that normal?

NickCraver · 2020-03-16T00:59:35Z

@hshazli Yep, that's normal. In your case, it seems to be a very overloaded thread pool and it wasn't able to pull bytes off the queue fast enough during a spike. It's possible network blips resulting in spikes of incoming traffic contributed to that.

@afinzel I believe your issue is different (#1120), but there's a fix in queue in #1374 there. Does yours happen after a network blip of some sort (even AWS failing over)? That's the situation there, the connection's hung in a bad state afterwards.

afinzel · 2020-03-25T11:53:02Z

@NickCraver, I didn't think so but I see that changed is merged, so I will get the latest version and retest. Thanks.

edit: It looks like it isn't on Nuget yet.

anilguzel · 2020-04-27T08:53:04Z

hi, i using redis over a month efficiently but i just got this error yesterday.
there is not seemable crash or failover, but its dark to getting meaningful.

logged:

Timeout performing ZSCAN (5000ms),
next: ZSCAN xxx, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 18, in-pipe: 0, out-pipe: 0,
serverEndpoint: xx.euc1.cache.amazonaws.com:6379,
mgr: 10 of 10 available,
IOCP: (Busy=0,Free=1000,Min=2,Max=1000),
WORKER: (Busy=37,Free=32730,Min=2,Max=32767), v: 2.1.0.1

infos:

Endpoint: AWS ElastiCache Redis

.NET Core App 2.2.107

AWS configuration of Redis says "Encryption in-transit: No"

The metrics on the server seemed normal.

i hope that would helpful, thanks

michaelrod64 · 2020-04-29T15:48:46Z

We're exploring a theory on a now-known cause and could use your help. For anyone experiencing issues on a .NET Framework application - are you in either of the following cases?

.NET Framework Web Projects

In web.config, either explicitly setting
<add key="aspnet:UseTaskFriendlySynchronizationContext" value="false" />
or, something less than 4.5 in <httpRuntime> (also web.config):
<httpRuntime targetFramework="4.x" />
.NET Framework Non-web Projects

In app.config, something less than 4.5 in <supportedRuntime>:
<configuration>
    <startup> 
        <supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>
    </startup>
</configuration>
To clarify: it doesn't matter what your .csproj (or other) target framework is for this case. These .config flags define runtime quirk behavior and alter how threads are handled - resulting in some of our should-be-dedicated bits being stolen and our queue indefinitely hung in a bad state.

If this matches anyone here, it'd be hugely helpful to know. @mgravell is working on a workaround for this scenario.

@NickCraver, Just wanted to clarify here:

Is the following fine in my App.config?

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.2" />

According to this Microsoft documentation, any .Net Framework version from 4.0-4.8 should use "v4.0"

https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/startup/supportedruntime-element

Your comment, which had
<supportedRuntime version="v4.x" sku=".NETFramework,Version=v4.x"/>

seemed to imply that we could have a value greater than v4.0 for "version"

vhmoura · 2020-07-01T11:34:43Z

Hi there,

Also getting similar error. This is running on a azure function. Built using .net core 2.2 using version 2.1.39

System.Exception: Error fetching customer tags for devices 'xxxx'. ---> StackExchange.Redis.RedisTimeoutException: Timeout performing EXISTS (5000ms), next: EXISTS xxxxxxx, inst: 1, qu: 0, qs: 16, aw: False, rs: ReadAsync, ws: Idle, in: 850, serverEndpoint: tttttt:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: xxxx, IOCP: (Busy=16,Free=984,Min=6,Max=1000), WORKER: (Busy=0,Free=32767,Min=6,Max=32767), v: 2.1.39.39788 (Please take a look at this article for some common client-side issues that can cause timeouts:

https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2605 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54
at StackExchange.Redis.RedisDatabase.KeyExists(RedisKey key, CommandFlags flags) in //src/StackExchange.Redis/RedisDatabase.cs:line 668

Any help appreciated.

Cheers,
Vini

SKamalou · 2020-07-11T15:50:01Z

I get below errors where try to to get or set a big data (about 70 MegaByte). any idea? can i increase redis timeout?

I use redis on windows.
StackExchange.Redis version: 2.1.58

what framework exactly? .NETFramework 4.6.1
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.6.1" />

is there TLS involved? no
what is the main "rs" value in the error log? TryParseResult
was the server actually jammed at the time? no

Timeout performing GET (5000ms), next: GET 8a5d01ea-6017-4110-a1f7-42f63e309ad3, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 81920, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 0 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 9b181d9d-98dc-488a-998c-aded191c2e5d, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 61027, in-pipe: 83549, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 3 of10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=0,Free=1000,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Timeout performing GET (5000ms), next: GET 2db678e5-4a12-44be-98fd-173cd64fa0cd, inst: 0, qu: 0, qs: 1, aw: False, rs: TryParseResult, ws: Idle, in: 65536, in-pipe: 147456, out-pipe: 0, serverEndpoint: 172.18.6.40:6379, mc: 1/1/0, mgr: 2 of 10 available, clientName: 4GON-SERVICES2, IOCP: (Busy=2,Free=998,Min=100,Max=1000), WORKER: (Busy=8,Free=2039,Min=20,Max=2047), v: 2.1.58.34321 (Please take alook at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

mgot90 · 2020-10-14T20:29:45Z

This issue lives on - 2 years later. @CDargis was definitely onto something with: "We only experience this when running many task / with (Task.WhenAll())" . In my case we see this when doing 1,000s of GETs while using Parallel.ForEach. Setting MaxDegreeOfParallelism = Environment.ProcessorCount was a big help - but does not fix everything. Has anyone else found a better solution?

Active issue for 2 years now.

mgot90 · 2020-10-14T20:38:01Z

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

jeffras · 2020-11-03T02:13:27Z

Found a fix here: Change GetDatabase to return a IDatabaseAsync instead of a IDatabase and problem disappears.

@mgot90
We are currently struggling with this. We have a process using the cache and redis transactions. When scaling the process to handle many concurrent requests we hit the timeouts almost immediately. Can you provide more info on the fix that you discovered? The only fix that we have found is to limit concurrent requests to redis, this defeats the purpose of the multiplexing and we take a huge hit with performance.

arefinsami · 2020-12-16T18:03:57Z

Having the same exception without any obvious reason.

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5641ms elapsed, timeout is 5000ms), command=GET, next: GET <MyKey>, inst: 0, qu: 0, qs: 1, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/<myredis>.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0009DFC8C24D, IOCP: (Busy=0,Free=1000,Min=50,Max=1000), WORKER: (Busy=3,Free=8188,Min=50,Max=8191), Local-CPU: unavailable, v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

I set both IncludePerformanceCountersInExceptions and IncludeDetailInExceptions to true, but still no CPU information.

mgot90 · 2020-12-16T18:16:42Z

@jeffras I'm just another dev trying to fix the same problem. Here's what we did:

Applied StackExcanges best practices: https://gist.github.com/JonCole/925630df72be1351b21440625ff2671f#file-redis-bestpractices-stackexchange-redis-md
Going IDatabaseAsync seemed to allow a lot more transactions to go through the pipeline
We made our connection a singleton

In the end, we still have this happening every now and then. We're using Azure for our Redis provider. I haven't ruled out that our pricing tier is also related: https://azure.microsoft.com/en-us/pricing/details/cache/

Not sure if it's useful to you, but here is the Azure best practices as well: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices

Did you guys have luck or try anything different?

jeffras · 2020-12-17T01:38:58Z

@mgot90 Thanks for the info, did you modify the stack exchange code for IDatabaseAsync? I did see it in the source but didn't want to maintain a branch to accomplish this. We still have the problem. We ended up having to wrap our code in a lock and it went away, but our throughput clearly suffers.

PrzemekGawron · 2020-12-17T09:36:07Z

We've been dealing with these kind of errors since months (if not even years). Three months ago we've decided to reimplement our caching layer with following design assumptions:

we do not store in redis anything that is larger than 100KB
objects larger than 100KB can either be stored in BlobStorage or logic falls back to primary source. (It is configurable and depends on usecase)

I'm aware that this approach might not be ideal for everyone and has some performance footprint (as you need to check the size) but at least may shed some more light.

Our setup:

Azure Cache for Redis (we've tried multiple pricing tiers but without significant effect)
Azure App Services (net46, netcore11->netcore31) migrated later to AKS (netcore31)
But neither combination worked until we've introduced the changes mentioned at the beginning which I think is kind of recommendation you can infer from docs and also a suggestion we've got from Azure Support at the end of this long journey.

NickCraver · 2022-02-06T02:00:59Z

Going to close this out to cleanup - we had lots of various cases leading to timeouts in a myriad of environments and combinations. Many of these reports led to tweaks, some improved handling, and much better error messaged giving us more information. I want to close this old issue out so that anyone hitting something today in a much newer version and error message can more easily get help.

NickCraver closed this as completed Feb 6, 2022

RedisTimeoutException exception message question #973

RedisTimeoutException exception message question #973

Comments

CDargis commented Oct 12, 2018 • edited Loading

mgravell commented Oct 12, 2018 via email

CDargis commented Oct 12, 2018 • edited Loading

CDargis commented Oct 19, 2018 • edited Loading

danmml commented Oct 24, 2018

vanderlei-dev commented Oct 29, 2018

CDargis commented Oct 30, 2018 • edited Loading

hpk commented Nov 1, 2018 • edited Loading

hpk commented Nov 28, 2018 • edited Loading

Bogatinovski commented Jan 7, 2019

ispysoftware commented Feb 1, 2019

unkusr007 commented Apr 30, 2019 • edited Loading

satuday commented Aug 29, 2019

Alik2015 commented Sep 5, 2019

RajaInzamam commented Nov 21, 2019

limoniapps commented Dec 20, 2019

hiral-cloud-evangelist commented Jan 28, 2020

mgravell commented Jan 28, 2020 • edited Loading

blinker001 commented Jan 28, 2020 • edited Loading

dplarina commented Jan 28, 2020 • edited Loading

jasoncono commented Jan 29, 2020

mgravell commented Jan 29, 2020

hpk commented Jan 29, 2020 • edited Loading

CDargis commented Jan 31, 2020 • edited Loading

NickCraver commented Feb 1, 2020

.NET Framework Web Projects

.NET Framework Non-web Projects

afinzel commented Mar 4, 2020

hshazli commented Mar 12, 2020 • edited by NickCraver Loading

afinzel commented Mar 12, 2020

hshazli commented Mar 12, 2020 • edited Loading

NickCraver commented Mar 16, 2020

afinzel commented Mar 25, 2020 • edited Loading

anilguzel commented Apr 27, 2020 • edited Loading

michaelrod64 commented Apr 29, 2020

.NET Framework Web Projects

.NET Framework Non-web Projects

vhmoura commented Jul 1, 2020 • edited Loading

SKamalou commented Jul 11, 2020

mgot90 commented Oct 14, 2020

mgot90 commented Oct 14, 2020

jeffras commented Nov 3, 2020 • edited Loading

arefinsami commented Dec 16, 2020 • edited Loading

mgot90 commented Dec 16, 2020

jeffras commented Dec 17, 2020

PrzemekGawron commented Dec 17, 2020

NickCraver commented Feb 6, 2022

CDargis commented Oct 12, 2018 •

edited

Loading

CDargis commented Oct 12, 2018 •

edited

Loading

CDargis commented Oct 19, 2018 •

edited

Loading

CDargis commented Oct 30, 2018 •

edited

Loading

hpk commented Nov 1, 2018 •

edited

Loading

hpk commented Nov 28, 2018 •

edited

Loading

unkusr007 commented Apr 30, 2019 •

edited

Loading

mgravell commented Jan 28, 2020 •

edited

Loading

blinker001 commented Jan 28, 2020 •

edited

Loading

dplarina commented Jan 28, 2020 •

edited

Loading

hpk commented Jan 29, 2020 •

edited

Loading

CDargis commented Jan 31, 2020 •

edited

Loading

hshazli commented Mar 12, 2020 •

edited by NickCraver

Loading

hshazli commented Mar 12, 2020 •

edited

Loading

afinzel commented Mar 25, 2020 •

edited

Loading

anilguzel commented Apr 27, 2020 •

edited

Loading

vhmoura commented Jul 1, 2020 •

edited

Loading

jeffras commented Nov 3, 2020 •

edited

Loading

arefinsami commented Dec 16, 2020 •

edited

Loading