Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent RedisTimeoutException #4376

Open
MatthewSteeples opened this issue May 28, 2019 · 6 comments
Open

Recurrent RedisTimeoutException #4376

MatthewSteeples opened this issue May 28, 2019 · 6 comments
Milestone

Comments

@MatthewSteeples
Copy link

MatthewSteeples commented May 28, 2019

Expected behavior

Connection is re-initiated

Actual behavior

When we receive a RedisTimeoutException as part of SignalR, we suspect that every request to SignalR after that will result in the same exception being thrown. We're coming to this conclusion because the message is always the same (to the millisecond) and the milliseconds specified don't elapse before the exception is thrown (if we're sending frequent messages). A subsequent request returns immediately, telling us that the request has timed out.

Example exception message:

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5718ms elapsed, timeout is 5000ms), command=EVAL, next: EVAL, inst: 0, qu: 0, qs: 2, aw: False, rs: ReadAsync, ws: Idle, in: 1161, serverEndpoint: Unspecified/xxxxxx.redis.cache.windows.net:6380, mgr: 10 of 10 available, clientName: RD0003FF7DC5E9, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=10,Free=8181,Min=4,Max=8191), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

As far as I can see, all of these numbers look fine and you can see that the server is not under load (which is a different issue).

To be clear, I'm not after the cause of the timeouts here, just want to get to the bottom of why a timeout stops any further messages from being sent.

Steps to reproduce

We're seeing this quite frequently, but haven't yet reproduced it not outside of our production environment. If there's not enough detail here then I'll try and reproduce it. We're using version 2.4 currently, and the StackExchange Redis variant.

@MatthewSteeples
Copy link
Author

Stacktrace:

System.AggregateException:
StackExchange.Redis.RedisTimeoutException:
at Microsoft.AspNet.SignalR.Messaging.ScaleoutStream.Send
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Owin.OwinWebSocketHandler+<>c__DisplayClass5_0+<b__0>d.MoveNext
at Microsoft.AspNet.SignalR.Messaging.ScaleoutStreamManager.Send
at Microsoft.AspNet.SignalR.Messaging.ScaleoutMessageBus.Send
at Microsoft.AspNet.SignalR.Messaging.ScaleoutMessageBus.Publish
at Microsoft.AspNet.SignalR.Infrastructure.Connection.Send
at Microsoft.AspNet.SignalR.ConnectionExtensions.Send
at Microsoft.AspNet.SignalR.GroupManager.Add
at Ledgerscope.Web.Shared.SignalR.Hubs.JobChangedNotificationHub+d__6.MoveNext
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Hubs.HubPipelineModule+<>c__DisplayClass0_0+<b__0>d.MoveNext
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
at Microsoft.AspNet.SignalR.Hubs.HubPipelineModule+<>c__DisplayClass0_0+<b__0>d.MoveNext

@analogrelay analogrelay added the triage-review This issue needs to be reviewed during triage label Jun 11, 2019
@analogrelay
Copy link
Contributor

We have seen this kind of behavior before. I'll put this in our list to look at further. Be aware that our team's priority is ASP.NET Core SignalR, but we will take a look

@timavaza
Copy link

Hi Matthew & Andrew,
We are having the same problem.. timeout exceptions on Eval request via stackexchange redis.

I did receive this piece of advice from a tech at a redis host. Hoping it might be relevant and help someone work out this issue:

"The timeout error shows that you are running the Redis EVAL command against the Redis node 'redisbackplaneprod1'. This command is used to execute a Lua script server-side as per the following documentation:

https://redis.io/commands/eval

In this case, the timeout could indicate that the Lua script is taking a long time to process on the Redis node. When a Redis node is running a LUA script, it will not run anything else; which could mean that it could not serve other requests resulting in a timeout. The Redis SLOWLOG system will log any requests that take longer than a specified execution time. I highly recommend using SLOWLOG to confirm whether or not the timeouts are a result of queries on the Redis node taking a large amount of time to complete.

Documentation regarding reading and configuring SLOWLOG can be found at the following link:

https://redis.io/commands/slowlog"

But they also told me our redis server was not heavily loaded.. and only 120 active users on our site.. so maybe issues with the Eval implementation?

@analogrelay
Copy link
Contributor

The issue here isn't the timeout specifically though, as referenced in the text above. It's that SignalR is rethrowing the error even after the timeout issue is resolved. That shouldn't be happening.

@MatthewSteeples
Copy link
Author

Thanks for the reply. You're right that the problem is the recovery, not the timeout itself. Fully appreciate that this isn't a priority. We're probably about 12 months off our own migration to Core, but we'll get there :)

@analogrelay
Copy link
Contributor

Yeah, this is a "bug" though so we do want to look at it eventually :). Glad to hear you're looking to migrate!

@analogrelay analogrelay added this to the 2.4.x milestone Jun 27, 2019
@analogrelay analogrelay removed the triage-review This issue needs to be reviewed during triage label Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants