Recurrent RedisTimeoutException #4376

MatthewSteeples · 2019-05-28T16:25:19Z

Expected behavior

Connection is re-initiated

Actual behavior

When we receive a RedisTimeoutException as part of SignalR, we suspect that every request to SignalR after that will result in the same exception being thrown. We're coming to this conclusion because the message is always the same (to the millisecond) and the milliseconds specified don't elapse before the exception is thrown (if we're sending frequent messages). A subsequent request returns immediately, telling us that the request has timed out.

Example exception message:

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5718ms elapsed, timeout is 5000ms), command=EVAL, next: EVAL, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 1161, serverEndpoint: Unspecified/xxxxxx.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF7DC5E9, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=10,Free=8181,Min=4,Max=8191), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

As far as I can see, all of these numbers look fine and you can see that the server is not under load (which is a different issue).

To be clear, I'm not after the cause of the timeouts here, just want to get to the bottom of why a timeout stops any further messages from being sent.

Steps to reproduce

We're seeing this quite frequently, but haven't yet reproduced it not outside of our production environment. If there's not enough detail here then I'll try and reproduce it. We're using version 2.4 currently, and the StackExchange Redis variant.

MatthewSteeples · 2019-05-28T18:46:26Z

Stacktrace:

System.AggregateException:
StackExchange.Redis.RedisTimeoutException:
at Microsoft.AspNet.SignalR.Messaging.ScaleoutStream.Send
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Owin.OwinWebSocketHandler+<>c__DisplayClass5_0+<b__0>d.MoveNext
at Microsoft.AspNet.SignalR.Messaging.ScaleoutStreamManager.Send
at Microsoft.AspNet.SignalR.Messaging.ScaleoutMessageBus.Send
at Microsoft.AspNet.SignalR.Messaging.ScaleoutMessageBus.Publish
at Microsoft.AspNet.SignalR.Infrastructure.Connection.Send
at Microsoft.AspNet.SignalR.ConnectionExtensions.Send
at Microsoft.AspNet.SignalR.GroupManager.Add
at Ledgerscope.Web.Shared.SignalR.Hubs.JobChangedNotificationHub+d__6.MoveNext
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Hubs.HubPipelineModule+<>c__DisplayClass0_0+<b__0>d.MoveNext
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Hubs.HubPipelineModule+<>c__DisplayClass0_0+<b__0>d.MoveNext

analogrelay · 2019-06-11T18:47:13Z

We have seen this kind of behavior before. I'll put this in our list to look at further. Be aware that our team's priority is ASP.NET Core SignalR, but we will take a look

timavaza · 2019-06-11T19:04:46Z

Hi Matthew & Andrew,
We are having the same problem.. timeout exceptions on Eval request via stackexchange redis.

I did receive this piece of advice from a tech at a redis host. Hoping it might be relevant and help someone work out this issue:

"The timeout error shows that you are running the Redis EVAL command against the Redis node 'redisbackplaneprod1'. This command is used to execute a Lua script server-side as per the following documentation:

https://redis.io/commands/eval

In this case, the timeout could indicate that the Lua script is taking a long time to process on the Redis node. When a Redis node is running a LUA script, it will not run anything else; which could mean that it could not serve other requests resulting in a timeout. The Redis SLOWLOG system will log any requests that take longer than a specified execution time. I highly recommend using SLOWLOG to confirm whether or not the timeouts are a result of queries on the Redis node taking a large amount of time to complete.

Documentation regarding reading and configuring SLOWLOG can be found at the following link:

https://redis.io/commands/slowlog"

But they also told me our redis server was not heavily loaded.. and only 120 active users on our site.. so maybe issues with the Eval implementation?

analogrelay · 2019-06-11T19:48:33Z

The issue here isn't the timeout specifically though, as referenced in the text above. It's that SignalR is rethrowing the error even after the timeout issue is resolved. That shouldn't be happening.

MatthewSteeples · 2019-06-11T20:35:36Z

Thanks for the reply. You're right that the problem is the recovery, not the timeout itself. Fully appreciate that this isn't a priority. We're probably about 12 months off our own migration to Core, but we'll get there :)

analogrelay · 2019-06-11T21:48:45Z

Yeah, this is a "bug" though so we do want to look at it eventually :). Glad to hear you're looking to migrate!

analogrelay added the triage-review This issue needs to be reviewed during triage label Jun 11, 2019

analogrelay added the type: Bug label Jun 11, 2019

analogrelay added this to the 2.4.x milestone Jun 27, 2019

analogrelay removed the triage-review This issue needs to be reviewed during triage label Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recurrent RedisTimeoutException #4376

Recurrent RedisTimeoutException #4376

MatthewSteeples commented May 28, 2019 •

edited

Loading

MatthewSteeples commented May 28, 2019

analogrelay commented Jun 11, 2019

timavaza commented Jun 11, 2019

analogrelay commented Jun 11, 2019

MatthewSteeples commented Jun 11, 2019

analogrelay commented Jun 11, 2019

Recurrent RedisTimeoutException #4376

Recurrent RedisTimeoutException #4376

Comments

MatthewSteeples commented May 28, 2019 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce

MatthewSteeples commented May 28, 2019

analogrelay commented Jun 11, 2019

timavaza commented Jun 11, 2019

analogrelay commented Jun 11, 2019

MatthewSteeples commented Jun 11, 2019

analogrelay commented Jun 11, 2019

MatthewSteeples commented May 28, 2019 •

edited

Loading